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Abstract 

We present an algorithm for tests generation tools based on symbolic execution. The algorithm 
is supposed to help in situations, when a tool is repeatedly failing to cover some code by tests. The 
algorithm then provides the tool a necessary condition strongly narrowing space of program paths, which 
must be checked for reaching the uncovered code. We also discuss integration of the algorithm into the 
tools and we provide experimental results showing a potential of the algorithm to be valuable in the 
tools, when properly implemented there. 

1 Introduction 

Symbolic execution serves as a basis in many successful tools for test generation, including Klee [5], Exe [7J, 
Pex [28 , Sage [13], or Cute j57j- These tools can relatively quickly find tests that cover majority of code 
close to program entry location. But then the ratio of covered code increases very slowly or not at all. The 
reason for that is a huge number of program paths to be explored. And it typically becomes very difficult 
to find a path to a given yet uncovered program location among all those paths. We speak about the path 
explosion problem. 

In this paper we introduce an algorithm for the tools mentioned above, which can be very useful in 
situations, when all attempts to cover a particular program location are repeatedly failing (so a tool stops 
making a progress). Given that program location, our algorithm computes a nontrivial necessary condition 
over-approximating a set of program paths leading to that location. The intention is to have the over- 
approximation as small as possible, while still keeping the condition simple for SMT solvers. Having this 
condition a tool can quickly recover from the failing situation by exploring only paths satisfying it. 

For a given program and a target location in it we construct the necessary condition by collecting 
constraints appearing along acyclic program paths from entry location to the target one, while summarizing 
effects of loops along them. It is well known that loops are the main source of the path explosion problem. 
Therefore, the key part of the algorithm is the computation of loop summaries. 

The algorithm is supposed to be integrated into the tools as their another heuristic. Therefore, complexity 
of the integration also matters. We show that since the algorithm actually computes a formula, the integration 
is very straightforward. And we show on a small set of representative benchmarks that Pex could benefit 
from the algorithm, when properly implemented there. The results can be extrapolated to the remaining 
tools, since they have the common theoretical background. 

2 Program 

Program definition A program is a tuple P = (Vp, Ep,l s ,l t , bp) such that (Vp,Ep) is a connected 
oriented graph, vertices Vp represent program locations and edges Ep represent control flow between them. 
P has a single start vertex l s € Vp and a single target vertex l t € Vp, satisfying l s ^ l t . Each vertex has 
out-degree at most 2. A vertex is a branching vertex if its out-degree is exactly 2. All other vertices, except 
l t , have out-degree 1. In-degree of l s and out-degree of l t are both 0. Function ip : Ep — > X assigns to 



each edge e a single instruction t(e) from the set I of all instructions. Out-edges of any branching vertex 
are labelled with instructions assume^) and assume(^7), where 7 is a boolean expression. Any other edge 
(i.e. non-branching one) is labelled either with an assignment instruction e\ < — e2, where ei,e2 are 1- value 
and r-value expressions respectively, or with an assertion assert(?/>) or an assumption assume^), for some 
boolean expression ip, or skip instruction, which does nothing. We assume that expressions in program 
instructions have no side-effects. Without loss of generality we require that boolean expressions in assume 
and assert instructions contain no logical connective (i.e. they are predicates). We further require that 
semantic of all instructions in I uses only linear integer arithmetic and arrays. Note that P does not contain 
neither function calls nor pointer arithmetic. We can supply a precondition ip and a postcondition ip for 
P by introducing new vertices l. s ,h and connecting them to old ones by two edges. And the labelling of 
the only out-edge of l a and of the only in-edge to l t are assume^) and assert (ip) instructions respectively. 
When program P is known form a context we often abbreviate V, E, l s ,l t , and 1. 

Treating lists as arrays Let us first consider an array A. If we define successor function succ on elements 
of A, the fc-th element of A, commonly described as A[fc] , can be identified by succ k (k). Note that succ k (x) 
represents a composition of fc applications of succ starting on x. Let us now consider a list L with successor 
function next. Then the fc-th element of L can be identified by next k (L). Therefore, we can also use notation 
L [fc] even for lists. Because of this equivalence in treating lists and arrays, we consider only arrays in the 
remainder of the text. It is also important to note that we do not provide shape analysis. Thus shape of 
lists and arrays are immutable. 

Assertions and assumptions in a program Suppose that we execute symbolically a program. Let ip be 
a path condition. Then assert (7) forces validity check of formula ip —¥ 7. The execution may continue only 
if the check succeeded. Note that the path condition is not updated. On the other hand assume (7) updates 
the path condition such that ip < — ip A 7 and the execution may continue, if updated ip is satisfiable. 

Program variables and expressions Let P be a program. Then Vp = {a, A, b,B, . . .} is a finite set of 
program variables. We suppose each program variable has its type. We further define a countable set £p of 
all syntactically correct expressions of P over variables Vp . We also suppose that each expression in £p has 
its type. When P is known from the context we write V and £. 

Path in F A sequence 7r = V1V2 • ■ ■ Vk is a path in a program P, if for all 1 < i < fc a pair (vi, t>»+i) is 
an edge of P. We denote the empty path by e. We identify the i-th vertex in ir as ir(i) and \ir\ denotes 
the total number of vertices in ir. But instead of 7r(|7r|) we write lst(7r). For each path ir we define a set 
pref(7r) = {(3 \ n = f3-f} of all prefixes of n. A path tt from l s in a program P is feasible if there exists an 
input, such that execution of P on the input follows tt. Otherwise tt is infeasible. 

Backbone paths in P Each acyclic path from l s to l t in P is a backbone path in P. Let 7ri,7r 2 be two 
backbone paths. Since each backbone path starts in l s , there always exists a non-empty common prefix of 

1"1)7T2- 

Reduction to a backbone path Let tt be a path from l s to l t in a program P. We say that tt is reducible 
to a backbone path tt', when a result of the following procedure applied to tt produces exactly the path tt': 
Let fc be the least index in tt such that vertex 7r(fc) occurs in tt once again (i.e. at an index bigger then fc). 
If no such fc exists, then we are done, as tt is a backbone path. Otherwise let I be the greatest index such 
that 7r(Z) = 7r(fc). If we denote n(k) by u, then ir is of the form n = au(3wy, where (i ^ e (since I > fc). We 
set tt to au-f and repeat the procedure. 

Note that for each path tt from l s to l t there exists exactly one backbone path the path n is reducible to. 
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Loop, loop entry vertex, and loop exit vertex of P Let P be a program and av be an acyclic path 
from l s in P. Let C v be the smallest subset of Vp such that for each path vv\ ■ ■ ■ v n v of two or more vertices 
in P, where none of vi, . . . , v n € Vp appears in the path av, all the vertices v, i>i, . . . , v n € C v . If C v ^ 0, 
then C v is a loop at v in P, and v is a loop entry vertex of P. And a vertex u € Vp \C„ is a Zoop exzi /rom C„, 
if there exists roeC„ such that (w, u) e £p. We denote a set of all exit vertices from a loop C„ as exits(C„). 

Program equivalence Programs P and Q are equivalent, if there exists a bijection between all paths from 
start to target vertex in P and all paths from start to target vertex in Q such that sequences of instructions 
along related paths are exactly the same, when ignoring skip instructions. 

Normalized program Program P is normalized, if each in-edge of each loop entry vertex of P is labelled 
by an instruction skip. Given a program P, it is easy to compute a normalized program P' which is 
equivalent with P: We start with P' as a copy of P. For each loop entry vertex v we create its copy v' and 
then we replace every edge (u, v) by a new edge (u, v') with the same label. Finally we connect v' with v by 
a new edge (V , v) labelled with skip instruction. 

In the remainder of the text, whenever we speak about program we always assume it is normalized. 

Program induced by a loop Let P be a program, v be a loop entry vertex of P, and C be a loop at v. 
We can compute a program P(C,v), representing reachability in C, as follows. We start with P(C,v) as a 
copy of C. To get a program we only need to set right start and target vertices. In P(C,v) there must be 
a copy of v. Let v' be the copy. Then we set v' as the start vertex l s of P' . Further, we add a new vertex 
into P(C, v) and we set it as a target vertex l t . Finally we replace each edge (u, l s ) of P(C, v) by a new edge 
(u, l t ) with the same label. We call the resulting program P(C,v) a program induced by a loop C at v. 

Iterating B Let P be a program induced by some loop C at a loop entry vertex of some bigger program. 
Further let B be a backbone tree of P. Since B represents all acyclic paths along C, thus any execution 
looping in C actually iterates backbone paths in B. Therefore, it is relevant to speak about iterating B. 
Similarly, when we consider a single backbone path ir, then we can speak about iterating ir. Note that in 
case of presence of nested loops in C we can extend the definition recursively for iterating backbones trees 
of sub-induced programs, and so on. 

3 Over-approximation (p of Feasible Paths 

Partitioning feasible path There can be a huge or even infinite number of feasible paths from l s to l t 
in P. However, we can partition them into a finite number of classes according to the following lemma. 

Lemma 1. Let $p be a set of all feasible paths form l s to l t in P. If $p ^ 0, then there exists a finite 
partitioning lip of $p such that for each partition class A of lip, there exists a unique backbone path tta 
in P such that each path n e A is reducible to tta- 

Proof. Obvious. □ 
Corollary 1. Let P be a program. Then |IIp| < \Bp\. 

Partitioning of path conditions A basic property of symbolic execution says that when symbolic execu- 
tion on P terminates, then each path condition uniquely identifies one feasible path in P and vice versa. This 
bijection between feasible paths and related path conditions implies that the partitioning lip actually also 
represents partitioning of related path conditions. Since both partitions are equivalent we do not distinguish 
between them. 
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Over-approximating A € lip Any formula ipA is called an over-approximation of a partition class A € Hp 
if for each path condition ipA € A a formula — > ^ is valid. Note that ipA = true is an abstraction of A. 

Over-approximating $p Any formula ipp is called an over-approximation of non-empty $p if for each 
path condition ip e $p a formula p ipp is valid. Note that 93 p = £rwe is an abstraction of $p. We write 
ip>, when P is known from a context. 

We compute as a disjunction (pA x V ... V <fA n , where (pA x , ■ ■ ■ , <£>Ai are over-approximations of all 
partition classes A\, . . . , A n of lip respectively. 

4 Computation of (p 

Overview Let P be a program. If P does not contain a loop and target location is reachable, then each 
partition class A e lip contains a single path, which is a backbone one. Therefore, if it G Bp is the only 
backbone path in a class A, then we can compute an over-approximation pa of A as follows. We symbolically 
execute n. So we receive a path condition ip and symbolic state 9 from the execution. Since ip — >• y> is valid, 
we conclude that <£a = <P- 

Let us now consider a case, when P contains a single loop at some loop entry vertex w, the target 
location is reachable, and A e lip is a partition class such that each a £ A is reducible to a backbone path 
7r = Vi ■ ■ ■ VjWVj + i '■■!)„£ Bp. We can compute an over-approximation <pA of A as follows. The class A may 
contain even infinitely many feasible paths ct\,ot2,--- But each path on is of a form Qj = v\ ■ ■ ■ Vj(3iVj + i ■■ - v n , 
where ft 7^ £ represents a different cyclic path along the loop from w back to w. So the paths ft differ in 
number of iterations along the loop, and in interleaving of paths along the loop in separate iterations. Let 
us observe symbolic execution of paths a,. The execution proceeds exactly the same for a common prefix 
V\ ■ ■ ■ Vj. But then we reach the loop entry vertex w. Then the symbolic execution proceeds differently for 
all paths (ii. As a result we can get even infinitely many different path conditions and symbolic states. To 
prevent this, we compute an over-approximation of all those symbolic executions along the loop, so we get 
a single over-approximated path condition and a single over-approximated symbolic state. We compute the 
over-approximation as follows. 

We build an induced program P' of the loop at w and we recursively call symbolic execution of its 
backbone paths as we do here for P. For each backbone path of P' we receive a single path condition 
and single symbolic state. The path conditions and single symbolic states represent all possibilities, how 
to symbolically execute the loop once from w back to w. But paths ft may go along the loop arbitrary 
number of times with arbitrary interleaving of paths through the loop. To maintain arbitrary iterations 
of backbone paths, we express values of all program variables of P' as functions of number of iterations of 
backbone paths of P' . Then, to handle arbitrary interleaving of backbone paths in different iterations along 
the loop, we "merge" symbolic states of different backbone paths (separately and independently for each 
variable) into a single resulting symbolic state. Then we insert values in the resulting state into computed 
path conditions. We use them to build a formula stating that symbolic execution will keep looping in 
the loop, until proper number of iterations of individual backbone paths of P' are met. The formula is 
a single resulting path condition. Both the resulting formula and symbolic state over-approximate sets of 
path conditions and symbolic states of the paths ft respectively, because their computation typically involve 
some lose of precision. We discuss in details the computation of the resulting formula and symbolic state in 
separate sections later. Only note that when P' also contains some loops, then we resolve the situation by 
another recursive calls in all loop entry vertices in backbone paths of P' . This is the same process as we did 
at vertex w of the backbone paths on. 

Let us now suppose that we already have the over-approximation, i.e a single over-approximated path 
condition and a single over-approximated symbolic state. We can use them to proceed to symbolic execution 
of the common remainder fj+i ■ • • v n of paths a.{. Obviously, we receive a single over-approximated path 
condition ip and a single over-approximated symbolic state 8 at the end. We show later in the section, that 
such a computed tp is indeed an over-approximation A, i.e pa = ¥■ 
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A case, when the backbone path ir contains more then one loop entry vertex (i.e. n goes through more 
then one loop), is now simple. Symbolic executions of the common parts of paths oti (i.e. those between loop 
entry vertices) are the same for all the paths ctij. Whenever we reach a loop entry vertex, we call the over- 
approximation procedure to get a single over-approximated path condition and a single over-approximated 
symbolic state. At the end we again receive a single over- approximated path condition ip representing a. 

When the partitioning lip has n classes A±, . . . , A n , then we apply the described procedure n times, once 
for each class. We receive path conditions tpi, . . . , cp n . Then a formula ifi V • • • V tp n is an over-approximation 
ip of $p, since each ipi is an over- approximation of Ai. 

In case the target location is not reachable, then none of the computed formulae tpi, . . . , tp n is satisfiable. 
Therefore, ipi V • • • V ip n is unsatisfiable as well. 



It remains to discuss individual parts of the presented algorithm, in details. First of all, the algorithm is 
based on symbolic execution. Therefore, we need a formal definition of a symbolic expressions and symbolic 
state. We provide the definitions in Sections |4.1| and |4.2| Since we symbolically execute backbone paths 
of a program, we provide their compact representation in a tree structure, called a backbone tree. The 



definition of the tree and its construction can be found in Section 4.3 The key property of the algorithm 



is a collection of path conditions computed along backbone path. Since we work intensively with their 
structure we decompose their structure along vertices of a backbone tree. Therefore we discuss definition 
and handling path conditions separately in Section |4.4| Symbolic execution of a backbone tree is then 
depicted in Section [43] The key part of the algorithm - the computation of an over-approximation of a loop 
at an entry vertex - is described in details in Section [5] And finally, an algorithm building the formula cp 



from results of the symbolic execution of a backbone tree is described in Section 4.6 



4.1 Symbolic Expressions 

Symbolic expressions Let P be a program, V be a set of variable names such that Vp C V, and Tp be 
a first order theory that captures the constants of Ip (like 0,1, true, etc.), the functions of Ip (like +,-, 
etc.), predicates of Ip (like <,=, etc.), and it also is a combination of several theories including theory of 
equality and uninterpreted functions, and theory of integers. We extend Tp as follows 

(1) For each program variable a € V of a scalar type r we introduce a new constant symbol a ranging over 
data domain of the type r. 

(2) For each program variable A € V of an array type int™ — > r we introduce a new function symbol A 
identifying a function from n-tuples of integers into data domain of the type r. 

(3) For all data types of P we extend their data domains such that they have a special new value _L in 
common. We also introduce constant symbol *, which is supposed to be always interpreted to _L. 

(4) For all three terms t, t', t" of extended Tp such that t is of type bool, and t' , t" has a same type r, we 
introduce term ite(t, i',f") of type r, whose value is t' if t is true, and t" otherwise. 

(5) If t is a term of the extended Tp containing symbol * in it, then we require that t = * is a valid 
formula in the theory. If p is a predicate symbol of the extended Tp containing symbol * as one of its 
arguments, then we require that p f-> A is a valid formula in the theory, where A is a fresh prepositional 
variable (in other words, p can be replaced by a fresh propositional variable). 

A set Sp(\/) of all terms and formulae of extended Tp is a set of symbolic expressions of a program P. Each 
e G 5p(V) is a symbolic expression of a program P. Note that each such e has its type (i.e. if e is a term, 
then type of e is a type of an element of a data domain defined by any interpretation of Tp, and if e is a 
formula, then type of e is bool). When program P is known from a context, then we write <S(V). And if 
we do not care what superset of Vp the set V exactly is, then we omit it as well. So we write Sp or even S 
(when P is known from a context). 
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Basic symbols and variables of basic symbols Let P be a program and V be a set of variable names 
such that Vp C V. Then E(<Sp(V)) = {a | a e V} is a set of basic symbols of Sp(V) and V(«Sp(V)) = V is a 
set of variables of basic symbols of Sp(V). 

Substitution into symbolic expression Let h, e, e' € S P be symbolic expressions of P of the same type. 
Then h[e/e'\ is such a symbolic expression /i, where all occurrences of e in h were replaced by the expression 
e'. An expression h[ei/e[, . . . , e„/e^] denotes simultaneous substitution of all pairs e^/e^ in h. 

Expression equivalence Let e, e' € <Sp be a two symbolic expressions of P. Then e is egtta^ to e', if 
(1) e, e' are both terms of extended T P and e = e' is a valid formula in extended T P , or (2) e,e' are both 
formulae of extended T P and e e' is a valid formula in extended T P . 

Special variable names of T P Let P be a program. We distinguish the following sets: (1) JC = | i € 
N} is a set of path counters. Each is a variable of T P and it ranges over No- (2) T = {n \ i £ N} is a set 
of parameters. Each t, is a variable of Tp and it ranges over No- (3) X = {xi \ i € N} is a set of argument 
placeholders. Each Xi is a variable of Tp and it ranges over integers. We assume all the sets are disjunctive. 
We further use the following notation. Let e € S P be a symbolic expression. Then we denote by JC(e) a set 
of all path counters appearing in e, and by Tie) a set of all the parameters appearing in e. 

r-substitution Let e,h £ S P be two symbolic expressions of P and {ti, . . . , r„} C T be all parameters 
contained in them. And let g € Sp be any symbolic expression containing none of the parameters {n, . . . , r„}. 
if both /i and g are of the same integer type, then e{h/g} is a symbolic expression computed from e as 
follows. Let e' be a symbolic expression equal to e with the same parameters and same number of their 
occurrences as in e and with a maximal number of occurrences of h as subexpressions. Then e{h/g} = 
e'[h/g][Ti/*, . . . ,r„/*]. We naturally extend the subexpression substitution to vector expressions: e{h/g} 
is an expression e{hi/g\} . . . {h n /g n }. Note that we require that vectors h and g have the same dimension. 

Comparison of vectors of symbolic expressions Let u = (ui, . . . ,u n ) and v = (vi, . . . ,v n ) be two 

vectors of some symbolic expressions u\,...,u n and v\,...,v n respectively. Then we use the following 
notation 

u < v 
u < v 

4.2 Symbolic State 

Symbolic noname functions Let P be a program, V be a set of variable names such that Vp C V. Then 
S\p(y) = {Axi, . . . , x« ■ e I e G <5p(V) A n e N A Xi> • ■ • , Xn € X} is a set of symbolic noname functions of 
5 P (V). Let Axi,-..,Xn • e e 5 A p(V) and e,ei,...,e„ e «S P (V). Then (Axi,...,Xn • e)(e 1 , . . . ,e n ) e «Sp(V) 
is a symbolic expression e[xi/ei, . . . , Xn/ e n]- When program P is known from a context, then we write 
S\ (V). And if we do not care what superset of Vp the set V exactly is, then we omit it as well. So we write 
S\ P or even S\ (when P is known from a context). 

Symbolic state Let P be a program, V be a set of variable names such that Vp C V. A function 
9 : V — >• iSp(V) U <Sap(V) is a symbolic state of P, if it satisfies the following 

• If a e V is of a scalar type r, then 0(a) G S P (V) is also of a scalar type r. 

• If A e V is of an array type int" — > t, then 0(A) € S\ P (V) is also of a type int" — > r, and it is of 
a form Axi, ■ ■ ■ ,Xn ■ e , for some e e Sp(V) of type r. We often use an abbreviated vector notation 
Ax • e. 



= < Hi A . . . A u n < v n 

n n 
= U < V A ^ Mi < ^ Vi 



i=l 
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Symbolic states Let P be a program, Sp be a set of symbolic expressions and S\p be a set of symbolic 
noname functions of Sp. Then we denote by A4(Sp) = {8 | 8 : V(Sp) —> Sp U Sap} a sg t of symbolic states 
of P. 

The most general symbolic state Let P be a program. We distinguish a special symbolic state 6c of 
P. It has the following properties: 

• For each a £ V(<Sp) of a scalar type we have 8q(cl) = a. 

• For each A £ V(Sp) of an array type we have #g(A) = Ax • 4(x)- 

The most unknown symbolic state Let P be a program. We distinguish a special symbolic state 8* of 
P. It has the following properties: 

• For each a £ V(Sp) of a scalar type we have #*(a) = *. 

• For each A £ V(Sp) of an array type we have #*(A) = Ax • *• 

Substitution into symbolic state Let 8 be a symbolic state of P and e,e' some symbolic expressions 
of P of the same type. Then 8[e/e'] is a symbolic state of P such that for each variable a £ V(Sp) we have 
8[e/e'](a.) = 8(a)[e/e'}. A symbolic state 8\e\je\^ . . . ,e n /e' n ] denotes simultaneous substitution of all pairs 
Gije! i into 8. 

Change in symbolic state Let 8 be a symbolic state of P, a £ V(iSp) be a program variable of a scalar 
type r, A £ V(Sp) be a program variable of an array type int™ — > r, and e be a symbolic expression of P of 
the type r. Then 8[& — ¥ e] is a symbolic state equal to except for variable a, where 8[& —¥ e](a) = e, and 
0[A — > e] is a symbolic state equal to 8 except for variable A, where 8[k — > e](A) = Axi, • • • ,Xn ■ e - 

Extending symbolic state to program expressions Let 8 be a symbolic state of P and e £ £p 
be a program expression. Then 8(e) £ Sp is a symbolic expression received from e such that (1) Each 
occurrence of each variable a appearing in e is replaced by symbolic expression 0(a), where we assume that 
all substitutions are applied simultaneously. (2) We replace all constant, operator and function symbols 
appearing in e by their counterparts in Tp. 

Substituting symbolic state into symbolic expression Let e £ Sp and 8 be a symbolic state of P. 
Then e8 £ Sp is a symbolic expression received from e such that each occurrence of each basic symbol 
a £ £(Sp) appearing in e is replaced by a symbolic expression 8(a). We assume that all the substitutions 
are applied simultaneously. 

Merge of symbolic states Let 8 and 8' be two symbolic states of P. Then 68' denotes a symbolic state 
of P such that for each program variable a we have (88') (a) = (8(a))8' . 

4.3 Backbone Tree 

Any two backbone paths of a program P always have some non-empty prefix in common. Therefore, we 
effectively store the backbone paths in a tree defined as follows. 

Backbone tree of P Let Vg be a set of all non-empty prefixes of backbone paths of a program P. Let 
i?e C Vg x Vb be a set of all pairs (a,av). Then we call a rooted tree Bp = (Vb,EbJs), where l s is the 
root, a backbone tree of P. Note that vertices of Bp identify acyclic paths from l s in P. We denote the 
set of all leaf vertices of Bp by . Note that is actually a set of all the backbone paths of P. When a 
program is known from a context or it is not important, then we simply write V,E,B and B. Algorithm [l] 
computes B for a program P. 
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Algorithm 1: buildBackboneTree(P) 





Input: P // a normalized program 




Output: B // a backbone tree for P 


1 


V B «— {l s } 


2 


E B ^$ 


3 


D <— 


4 


while there is a leaf cm £ Vg \ D such that u ^ l t do 


5 


foreach vertex v such that (u, v) € Bp do 


6 


if 3k such that v = (au)(k) then 


7 


D < — D U {cm} 


8 


else 


9 


Vb •< — Vg U {auv} 


10 


Eb i — E B U {{cm, auv)} 


11 


while P> ^ do 


12 


au -s — any element of D 


13 


D < — D \ {cm} 


14 


if au is a leaf vertex in current B then 


15 


D i — DU{a} 


16 


E B i — E B \ {(a, aw)} 


17 


V B < — V B \ {au} 


18 


return B 



Loop entry vertex, and loop Let B be a backbone tree of a program P. Then each vertex av E V(B) 
such that v is an entry vertex of P is a loop entry vertex of B. Let C be a loop at v. Then C is also a loop 
at av. 

Counting backbone paths of induced programs Let B be a backbone tree of a program P. We define 
a function ryg : Vq — > N as follows. Let av £ V B be a vertex of i3. If qjj is not a loop entry vertex then 
i] B (av) — 0. Otherwise, let C be a loop at a loop entry vertex v, and -Bg< be a set of all leaf vertices of a 
backbone tree B' of an induced program P(C,v). Then r/ B (av) — \B B i\. When B is known from a context 
we write n. 

4.4 Path Condition 

Function ^ Let B be a backbone tree of a program P. In Section |4.5| we show, how to execute B 
symbolically. In our analysis path conditions from these executions play a crucial role. Since we work with 
them intensively, and we examine their internal structure, it is not effective to represent them as a whole 
formulae (as typical in original symbolic execution). We rather attach their parts to vertices of B. This can 
be explain as follows. Let tt g B B be a backbone path of P. When executing ir symbolically, we execute 
instructions occurring along the path n. Execution of some instructions may cause extension of current path 
condition if by some formula 7 such that extended path condition is of a form tp A 7. Other instruction only 
change symbolic state, but keep path condition <p unchanged. To unify the approach for all instructions, 
we want that also these instructions extend path condition tp by some formula 7. If the formula 7 = true 
for these instructions, then we are done. Now we can assign to each vertex along tt a formula 7 received 
from executing an instruction. More precisely, path condition is initially set to true. Therefore, we assign 
formula true to the first vertex of n. Now suppose that symbolic execution reached a vertex au of tt and 
crai> is next one in tt. Then execution of an instruction t((it, v)) produce a formula 7 which we attach to the 
vertex auv. It is important to note, that we can always reconstruct actual path condition (p in each step 
of symbolic execution from the formulae attached to vertices along currently processed path such that we 
return conjunction of those formulae. 



8 



The situation is different in ioop entry vertices of B. There we enter a loop and we call the over- 
approximation algorithm. The result of the call is a single (over-approximated) path condition and single(over- 
approximated) symbolic state. We assign the resulting formula to the loop vertex. Note that there is always 
place for the formula, since we assume only normalized programs, so all in-edges to loop entry vertices are 
labelled with skip instruction. 

So all parts of path conditions can indeed be assigned to vertices of B. We formally introduce a function 
'■ Vb Sp assigning each vertex of B an symbolic expression of type bool. We build a content of 
the function during symbolic executions of backbone paths of B. We discuss a details of the execution in 
Section |4.5| But since ^b contains formula from which we construct the path conditions, therefore this 
function is a key property of whole algorithm. When a backbone tree is known from a context we simply 
write \t. 

Path counters at loop entry vertex The key part of the algorithm is computation of an over-approximation 
of a loop at some loop entry vertex. We already know that we compute the over-approximation such that 
we express values of program variables as functions of how many times backbone paths of induced program 
of the loop are executed. For this purpose we introduce for each such a backbone path a single and unique 
path counter. A path counter is a variable of a theory Tp of an integer type. We have already distinguish 
the infinite set K, of variable symbols for the path counters. 

For each loop entry vertex of B we know exactly how many fresh path counters we need to introduce. 
The count is equal to a number of backbone paths of an induced program at the loop entry vertex. We use 
the following naming convention for identifying path counters introduced at a loop entry vertices: Let a be 
a loop entry vertex of B. Then we identify the fresh paths counters introduced at a as K a ,i, ■ ■ ■ , K a j) t a y We 
assume, that order of backbone paths in induced program is fixed to provide unique mapping between the 
path counters and related backbone paths. 

Path condition part at vertex of B Let a £ Vg be a vertex of B and K a = (/€ 0) i, . . . , K a ,rj(a)) T identify 
all the path counters introduced at a. Then formula 



is a path condition part at vertex a. When a backbone tree is known from a context we simply write 
pc(a, ty, tp). Note that pc has additional parameter tp to allow insertion of a formula into a scope of the 
existential quantifier introduced in the last case of the definition. 

Path condition at vertex of B Let V1V2 ■ ■ ■ Wfc £ V B , where k > 0, be a vertex of B. Then recursively 
defined formula 



is a path condition at vertex v\V2 ■ ■ ■ v^. When a backbone tree is known from a context we simply write 
pc(a, ^). 

4.5 Symbolic Execution of Backbone Tree 

Let B be a backbone tree of a program P. To execute a B symbolically means that we symbolically execute 
all its backbone paths. To symbolically execute a backbone path tt of B means the following. We start at 
the first vertex l s of tt. There we set *f?(l s ) = true and we set actual symbolic state 9 to be the most general 
one, i.e. 9q. Then we proceed along tt per vertex until we process the last one. Let a be a vertex of tt lastly 
processed and let v be its successor in tt. If v is a loop entry vertex of P, then we call an algorithm, depicted 
in details in Section [5j computing an over-approximation of the loop. If v is not a loop vertex of P, then we 




pc B {vi,^,pc B [vxV2,^, ■ ■ -pc B (viV 2 ■ ■•Ufe,*, true) . . .)) 
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symbolically execute instruction v)). We discuss symbolic execution of individual instructions in details 
later in this section. In both cases we receive a formula which we put into \1/ and we also receive updated 
symbolic state. Then we proceed to another vertex of 7r with the updated state. It may also happen at 
some vertex during symbolic execution of tt that path condition, composed of formulae assigned to already 
processed vertices of ir, is not satifiable. Then there is no feasible path in P reducible to w. Therefore, we 
stop the execution at that vertex. We can also remove this path 7r from the tree B, since we have discovered 
it is useless for reachability of the target location of P. 

In Algorithm [2] we present symbolic execution of B in more details. The algorithm works as described 
above. But we do not execute backbone paths separately one by one. We rather execute them simultaneously, 
all at once. Therefore, we maintain a set Q of lastly processed vertices of all backbone paths. Since we also 
need to save actual symbolic states at those vertices, the elements of Q are actually pairs, i.e. vertex plus 
symbolic state. Another difference is, that the algorithm also computes function assigning final symbolic 
states to leaves of B. This function is a by product of the algorithm. It is only used by the over-approximation 
algorithm of Section[5] There it is used to compute an over- approximated symbolic state such that a backbone 
tree of induced program of a loop to be over- approximated is symbolically executed first (by this algorithm). 
Let us discuss all three cases which may occur at each vertex during the execution. 

At line[8]we determine, whether successor vertex auv of au is a loop entry or not. If so, then we identify 



a loop C at v and at line 10 we call the over-approximation algorithm overapproximateLoop, discussed in 
Section [HJ to obtain a formula ip K , which is an over- approximation of path conditions of all feasible paths 
looping in C, and symbolic state 9 K , which is an over-approximation of all changes in symbolic state made 
by all feasible paths looping in C. Having these over-approximations, we need to integrate them into current 
symbolic execution. It means, that we assign the formula ip K into function <]/ at vertex auv, and we store 
auv plus 9 K to be later able to process successors of auv in B. Note that both ip K and 9 K are updated 
by symbolic state 9 before they are integrated. This is because the over-approximation of C is computed 
independently form the remainder of P. And symbolic state 9 captures the current progress of symbolic 
execution up to the loop vertex v. We need to incorporate that progress into the over-approximation, before 
we integrate it into symbolic execution of B. 

If auv is not a loop entry vertex, then we must symbolically execute an instruction i((u, v)) labelling 
a program edge (u, v). Since it is purely technical matter, we leave its detailed description to the end of 
this section. Having the instruction executed we receive a formula representing an add-on to a current path 
condition. Therefore, we can directly assign it into "J at auv. As the second value form execution of i((u, v)), 
we receive an updated symbolic state, capturing an effect of b((u,v)) on original symbolic state 8. Next we 
check, whether a path condition, composed of all formulae assigned to vertices along the path auv so far. 

Let us suppose first the path condition is satisfiable. If we have not reached the target vertex yet, we 
store current progress in Q. Otherwise we store final symbolic state into function for the leaf auv and we 
are done executing current backbone path. 

In case the path condition is not satisfiable, we stop symbolic execution at auv. We know that any 
further progress form auv along any backbone path with prefix auv cannot represent feasible path to the 
target location. Therefore we can reduce B such that we remove from it exactly those backbone paths with 
a prefix auv, while keeping there all the others. Such reduced set of vertices of B is computed at line |21| 
Then we need to update all the remaining sets forming B. We cannot forget to update also function \& to 
be defined only on proper set of vertices of B at the end. 

Note that symbolic execution of B is always finite, since B is a finite binary tree of backbone paths and 
the same holds for backbone tree of induced programs of its loops. There is a finite number of loops in a 
program. 

Symbolic execution of a program instruction Let P be a program, ip be a symbolic expression of P 
of bool type representing a path condition, 9 be a symbolic state of P, and let / be an instruction. Then 
we compute a result 1(9, tp) of symbolic execution of / in 9 and <p according to a syntax structure of / as 
follows. We assume a is a variable a scalar type r, A is a variable an array type int" — > r, 7 is a program 
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Algorithm 2: executeBackboneTree (B, P) 
Input: 

B // a backbones tree of P 
P // a normalized program 
Output: 

* // function assigning parts of path conditions to vertices of B 
: B — » M(S) // final symbolic states at leaves of B 

1 * < — {(l s , true)} 

2 9 < — 

3 q ^{(i s ,e G )} 

4 repeat 



5 (au, 9) < — any element of Q 
e Q ^Q\{(au,9)} 

7 foreach auv e V B do 

8 if auv is a loop entry vertex of B then 

9 Let C be a loop at v 

10 (p K ,9 K ) i — overapproximateLoop(C, v) 

n ^(auv) < — p R 9 

12 Qi — Qu {(auv, 6*6)} 

13 else 

14 (^(auv),6) < — i((u,v))(9,pc(au, *)) 

15 if pc(auv, *&) is satisfiable then 

16 if v 7^ /t then 

17 Q < — QU {(am;,0)} 
is else 

19 Q(auv) i — 9 

20 else 

21 V B < — {/3 | 3tt e B B A mi!) ^ pref(7r) A/3 e pref (7r)} 

22 £ e < — #b|v b 

23 B B < Vb n -Bb 

24 * < *|y B 



25 until Q = 

26 return ("f, 0) 



expression of type bool, e is a program expression of type t, and ei, . . . , e„ are program expressions of P of 
type int. 

• I is an assumption assume^): If a formula ip — > #(7) is satisfiable, then 1(9, ip) is a pair (#(7), 9), and 
(false, 9) otherwise. 

• / is an assertion assert(7): If a formula p — > 9(j) is valid, then 1(9, <p) is a pair (true, 9), and (false, 9) 
otherwise. 

• / is an instruction skip: Then 1(9, ip) is a pair (true, 9). 

• / is an assignment a < — e: Then 1(9, ip) is a pair (inze, #[a — > 9(e)]). 

• I is an assignment A(ei, . . . , e„) < — e: Then 1(9, tp) is a pair (true, 9[k — > ite(xi = #(ei) A ■ ■ ■ A Xn = 
0(e n ),e,fl(A)(xi,...,Xn))])- 
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If \ 



4.6 Building cp 

After symbolic execution of a backbone tree B of a program P we have computed all the information we need 
to build resulting over-approximation tp of "Pp. The information is stored in function W as formulae attached 
to vertices of B. We know, that for a backbone path n 6 _Bg a formula pcg(-7r, ^S) is an over-aproximated 
path condition for all feasible paths reducible to ir. Therefore, the over-approximation (p is given by the 
formula 

[ false B B = % 

y^eBsP^i 71 ^) Otherwise. 

Since backbone paths have always non-empty common prefix, it is usually the case that we can simplify the 
formula (p. For each pair of backbone paths we move common part of their path conditions in front of the 
disjunction of their remainders. 

Observe, that composition of backbone paths in B precisely matches structure of such simplified formula. 
Therefore, we can infer a simple algorithm on B, which build (p in already simplified form. We depict its 
pseudo-code in Algorithm (3) The algorithm is recursive. It accepts the backbone tree £>, function \fr already 
filled in during symbolic execution of B, and a vertex a of B. To receive tp we need to call the algorithm 
with the root vertex l s . 

Algorithm 3: buildSimplif ied(S, a) 

1 7 < — ite(a is a leaf of B, true, false) 

2 foreach aw £ Vg do 

3 7 < — (7 V buildSimplif ied(B, 'J, av)) 

4 return pc B (a, \I>, 7) 

Note that the Algorithm [3] cannot be used, when B is empty. Nevertheless, this case is trivial, since (p is 
false. We use Algorithm [3] only for non-empty backbone trees. 

5 Loop Over-approximation 

Let C be a loop at a loop entry vertex v of a backbone tree B of program P. We want to over-approximate 
all feasible paths representing all possible looping in C by a single formula <p K and single symbolic state K . 
The formula ip K is an over-approximation of path conditions of all those feasible paths and it is supposed to 
ensure, that none of these feasible paths is early terminated. In other words, it prunes out all those input 
to the loop C such that an execution of the loop for any such an input would terminate in some vertex 
of C different to v. Therefore, we call the formula <p K a looping condition of C. The symbolic state K 
over-approximates all changes into symbolic state which could be made by the feasible paths looping in C. 
Since its computation is based on expressing values of program variables as functions of how many times 
backbone paths of induced program of C are iterated, we call the symbolic state 9 K an iterated symbolic state 
of C. 

We depict a computation of the over-approximation (^,8^) of C in Algorithm [4J We first build an 
induced program P' for the loop C at v and then we construct a backbone tree B' of P'. When we have 



B', we can execute it symbolically as described in Section 4.5 As a result from the execution we receive 
functions '5' and 9'. At line [4] we resolve a trivial case, when the backbone tree B' becomes empty after its 
symbolic execution. That indicates, there is no feasible path iterating in C. Therefore, returned value at that 
line is indeed an over-approximation of C. If B' is not empty, we can proceed further in the computation. 
We compute the over-approximation (ip K ,9 K ) from the functions and 0' . First we compute the iterated 
symbolic state 9 K . This is done at lines [HJ|7j A step at line [5] is technical. The computation of 9 K involves 
presence of some artificial program variables and basic symbols in functions "J' and 0'. To save the original 
functions, we build copies 'F and of functions vp' and 0', where we introduce those artificial variables 
and symbols. The computation of (9 K itself is done at line [6j We postpone the detailed description of both 
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the introduction of artificial variables and the computation of 9 K into Section 5.1 The returned iterated 
symbolic state 9 K is defined also for the artificial variables. Therefore, we restrict 9 K into regular program 
variables at line [7J And finally, having function 1 J r ' and iterated symbolic sta te 9 K we can compute the 



looping condition tp K at line[8j We describe its computation in details in Section 5.2 



Algorithm 4: overapproximateLoop(C,'i>) 
Input: 

C // a loop at a loop entry vertex v of B 
v // the loop entry vertex 
Output: 

<p K // a looping condition of C 

9 K // an iterated symbolic state of C 

1 P' <— P(C,v) 

2 B' < — buildBackboneTree(P') 

3 (\P',6') < — executeBackboneTree(S', P') 

4 if Vb> — then return (true, 9c) 

5 (<3>,©) < — introduceArtif icials(*', O') 

6 9 R < — computelteratedState^, 6) 

7 9* < — 9% 

8 ip K i — compute looping condition from and 9 K 

9 return (ip K , 9 K ) 



5.1 Computation of iterated symbolic state 9 K 

Let P be a program, B be a backbone tree of P, and let C be a loop at a loop entry vertex a of B. We 
assume in this section, that we have already build a backbone tree B' of an induced program P' of the loop 
C, and that we have also executed B' symbolically. So we have also computed functions VP' and 8'. We 
further assume that Tr[, . . . , 7r^, where n — 773(0:), are all backbone paths of B' and that re Qj i, . . . , K a , n are 
all path counters introduced at a for the backbone paths of B' respectively. 

Our goal in the section is to describe algorithm computing iterated symbolic state 9 K . 9 K is a symbolic 
state, where values of program variables are expressed as functions of how many times the backbone paths 
of B' are iterated. Those numbers of iterations are captured in introduced path counters. Therefore, the 
resulting iterated symbolic state 9 R will be parametrized by the path counters. It means that for any concrete 
values substituted into the path counters in 9 K , we obtain a symbolic state over-approximating those received 
by symbolic execution of the backbone paths of B' as many times as defined by the values of counters. 

When B' contains loop entry vertices, then values of some variables may depend on concrete number 
of iterations along loops at those loop entry vertices. Since these numbers of iterations may be arbitrary 
in different iterations of backbone paths of B' , it is difficult to infer functions of path counters for values 
of such variables. Of course, we can always express the values as unknown value *. But we would loose a 
lot of precision. On the other hand, very precise analysis might be computationally expensive. Therefore, 
we provide an analysis still remaining simple, but precise enough for majority of programs our technique is 
designed for. We want to be precise in cases, when there is a linear relationship between number of iterations 
of a loop of B' and values of path counters introduced at a. In all other cases we use that unknown value *. 
Let 7 be a loop entry vertex of B' . Then path counters where m 1 — rjB'(j), introduced at 7 

identify a number of iterations of backbone paths of an induced program at 7. An expression k 7i i + - • -+K T , m 
defines a number of iterations backbone paths of the induced program at 7. Therefore, in our analysis we 
intend to express values of the expression K 7) i+- • -+K 7!m7 as a linear function of path counters n a ,i, ■ ■ ■ , Ka,n- 
Note that we do not try to compute values of individual path counters k 7J . 

Because all of this, we do not work directly with functions 'J' and 0' , but we first compute their updated 
versions ^ and O. The update lies basically in replacement of all occurrences of expression k 7 1 + • • • + K i m 
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in ^ and by newly introduced basic symbol s„. This replacement is done for each loop vertex 7 of B' . 
We do not have to forget to eliminate all remaining occurrences of path counters /c 7 ,i, . . . , Ky <m form both 
\& and 0. Since we cannot express their values, we replace them by unknown symbol We depict the 
computation of functions ^> and in more details in Algorithm [5| There we first set and to be copies 
of functions and 9'. Then we apply the substitutions for each loop entry vertex 7 of B' . At line [4] we 
declare general structure of a looping condition stored in at the loop entry vertex 7. Its structure is not 
important now. We discuss a structure of a looping condition later in Section |5.2| We replace this formula 
by one stored at line [5] Disregarding of meaning of these formulae, we can check that validity of the formula 
at line ji] implies validity of the formula at line [5] We replace the original formula in \& by the weaker one 
at line pj to save some precision: If we applied the substitutions on the original formula, we would receive a 
formula where antecedents of all implications in it would be of a form < r 7j i < *. We can see, that weaker 
formula at line [5] prevent such a substitution and brings therefore more precision after the substitution. At 
line [6] we enumerate all remaining vertices of B' such that vertex 7 is their prefix. For each such vertex 7/3 we 
apply the substitutions in function "J at line [7] and if 7/? is a leaf vertex of 23, then we apply the substitutions 
in function at line [9] Note that each artificial symbol s 7 represents an expressions k 7j i + • • ■ + K^ >m , y . 
Therefore, we later compute those linear relationships between artificial symbols s 7 and the path counters 

K>a, 1 7 ■ • ■ i ^a.n • 

We denote by V s a set of all fresh artificial program variables s 7 introduced into 0, and we denote by £ s 
a set of all fresh artificial basic symbols s 7 substituted into functions ^ and 6. Note that V s fl V(«Sp) = 
and E s n S(5 P ) = 0. 



Algorithm 5: introduceArtif icials ('J', 0') 

1 f i — 

2 0^—0' 

3 foreach loop entry vertex 7 of B' do 

4 Let ^(7) = A™"l( Vr 7,i (0 < T l-i < K 7.i -> 3t 7jJ ; (0 < f 7 ,j < Hy,i) A ^ 7 ,i) 

5 §( 7 ) <— Vs (0 < 3 < -> V£iftM(££l *r,k)/8][Ty,l/*, . . . r 7 , m >])) 

6 foreach vertex 7/3 of B do 

7 *( 7 /3) *(7/3)[(Er=\ «7,*)/4r][«7,l/*. ■ • ■ K 7,™,/*] 

8 if 7/3 is a Zea/ vertex of B then 

9 _©(7/3) < 0(7/3) [(Er=l K 7:fc)/s 7 ]K,l/*, ■ ■ ■ K 7 ,m 7 /*1 

io return (^P, 0) 



We can now move on to computation of the iterated state 9 K itself. We define a semi-lattice of all 
symbolic states, where we compute 9 K as a least fix-point of a monotone function defined later. Let us first 
describe the semi-lattice. Having S(V U V s ) we can define an order <= {(★, s) \ s € S(V U V s )} on it. Then 
Co = (<S(VUV S ), <) is a semi-lattice, where symbol * is the least element. Note that £q has finite height 2. We 
can define an order < on M(S(Vl)V s )) such that <= {(r,s) | r, s e M(S{V(J V s )) AVa G VUV S r(a) < s(a)}, 
then £ = (A4 (6>(V U V s )), <) is a map semi-lattice. The least element of £ is a symbolic state 6+ and also 
note that C is of finite height |V U V s |, since VU V s is finite. 

The symbolic state Q K is an element of the semi-lattice C and it is computed by Algorithm [6] as a least 
fix-point of a monotone function depicted at lines [3ffl3| in the algorithm. The algorithm computes Kleene's 
sequence leading to K as follows. At line [l] we set B K to be the least element 6* of C. Then the loop at line [2] 
computes the following elements of the Kleene's sequence. Note that this sequence is always finite, since 
C is of finite height. The monotone function is computed in two loops. The first loop at line [4] computes 
for each program variable a an iterated value of its values stored in 0. The iterated value for a variable 
is a function of path counters k q ,i, ■ ■ • , Ka,n expressing values of the variable for any number of iterations 



of backbone paths of B' . We discuss the details of this computation in Section 5.1.1 If the iterated value 



e is more precise, then the current value 6* K (a), we overwrite it with the iterated one. The second loop at 
line [9] computes for each loop entry vertex 7 of B' a linear function between an artificial basic symbol s 7 , 
representing an expression k 7j1 + • • • + K 7 .m 7 , and path counters k q .i, . . . , Kot,n- We discuss the details of 



14 



Algorithm 6: computelteratedState ('5, 9) 



x 6 R < — 0* 

2 repeat 

3 change < — false 

4 foreach a e V do 

5 e < — iterateVariable(a, 6, 6»"[a -> 0q(&)]) 

6 if 6> s (a) < e then 

7 (9 s (a) < — e 

8 change < — true 

9 foreach s 7 E V s do 

10 e < — iterationsDf Loop(7, 6* K [s 7 — > 6*g(s 7 )]) 

n if # E (s 7 ) < e then 

12 ^( s 7) < — e 

13 change -S — true 



14 until change = false 

15 return 6> K 



that computation in Sections |5.1.2| Whenever the result e is more precise then the value already stored in 
K , then the content of 9 K is updated. 

5.1.1 Computing iterated value of a program variable 

Algorithm [jj computes an iterated value e for a given program variable a. We start with expression e set to 
its related a basic symbol at line [TJ Then in loop at line [2] we enumerate backbone paths tt[ , . . . , 7r' n of B' in 
order as they are marked. Remember that n = ^(a). Let a backbone path ir^ be just enumerated. Then 
we update e according to a content of a in for the current path n^. We read the content of a from 0(7t^) 
at line [3] and store it into e'. Note that the result of the read is immediately followed by substituting 9 K 
into it. By the substitution we incorporate already iterated values of other variables into e'. Note that the 
value of a may depend on other variables. Then at line [4] we proceed differently for variables of a scalar and 
array types. Nevertheless, both branches are supposed to look up related table to get an iterated value for 
a (computed by combining e and e' in the table) . This is done at lines [5] and [7] It remains to discuss the 
use of Tables [l] and [2] We do that in separate paragraphs. 



Algorithm 7: iterateVariable (a, ^, 0, 6*) 

1 e i — G (a) 

2 foreach i = 1, 2, . . . , n do 

3 eV8«)(a)f 

4 if a is of a scalar type then 

5 e < — apply Table [l] for values (e, e') 

6 else 

7 e < — apply Table [2] for values (e, e') 

8 return e 



Iterating values of scalar type We combine the expressions e and e' of Algorithm [7] for a variable a of 
a scalar type according to Table [T] into a single iterated value. The expression e represents an iterated value 
of a of all already enumerated backbone paths tt' 1i . . . , 7r^_ 1 . And the expression e' represents symbolic value 
of a after symbolic execution of the backbone path 71^ as the last one. We use TablefTlto compute a resulting 
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iterated value as follows. We try to match expressions e and e' to an expression in the first column and first 
row respectively. In case either e or e! fails to match any of the expressions, then the resulting iterated value 
is -k. Otherwise we pick an expression from the table common to matched column and row. 

The expressions in the first row have a structure of all symbolic expressions we are interested in. We want 
to compute precise iterated values for them. The first expression identifies the case, when a is not written 
to along at all. The second expression matches syntactic structure of expressions, whose values follow 
some arithmetic progression. The arithmetic progression are the most common for variables of programs 
we are focusing on. For example majority of sequential traversals of arrays typically involve at least one 
variable whose values follow some arithmetic progression. And the third expression in the first row identify 
symbolic expressions whose values do not depend on iterations of other backbone paths. Typical examples are 
variables storing intermediate results, and more importantly flag variables. For example programs typically 
set or remove flags when scanning an array to check whether the array matches some property or not. 

To fully understand the content of the table we need to discuss meaning of symbols appearing in it. 
First of all we must say, that all the occurrences of the basic symbol a, all the path counters • • • , Ha,m 
and the expression * are explicit in the table, k is a natural number such that k < i, indices i\, . . . ,i k are 
all natural numbers, they are all distinct, and also less the i. They represent indices of some of already 
enumerated backbone paths ir[, . . . , Tr' i _ 1 . Symbols dj, di 1 , . . . , di k are symbolic expressions of P. Any pj is 
a symbolic expression which may contain at most a path counter n a j from the path counters K a ,i, . . . , n a _ n . 
Expressions ipj ik are defined as follows. 

| K a j > k = 1 

^ j ' k ~ I K a j > A 3fj(0 < fj < Kj A pej A yfj(fj < fj < Kj -> /\r=i ~ , pc'j r )) Otherwise, 



where 



T 3 = (n, ■ • ■ ,Tj-l,Tj + \, . . . ,Tfe) T , 

?j = ( T l; * * * ? Tj_ 1 ,Tj +1 , . . . , T' k ) T , 

(^a,l; • • • ; ^a.j— 1? K a j + l : . . . , K a fc) , 
P Cj =pCB<(Tr'j,$)e*[Kj/fj], 
pJ j =pC B ,{'K'j^)e R [Kj/?j\. 

A condition ipj^ determines whether a backbone path ir'j was symbolically executed at least once, and if 
so, then whether it was executed as the last one of already examined backbone paths where a is modified. 
Also note that we substitute 9 K into the formula peg' (n^ •, <J>). The substitution incorporates values of already 
iterated program variables into the formula. 

We also need to clarify a notation used in the expression in the last row and column, where we assume 
that index represents the value i. 





Ql 


a + di 


Pi 


a 


a 


Q 4" diK a ^i 


ite(V>i,i, Pi, a) 


Ql + '12 j — 1 dijKajj 


Ql + y^j— 1 Ko,i 3 


Ql "i" diK a i -\- y~]j-_| dij K a ij 




ite(tp il!k , p h , 


ite^ij.fe,^!, 




ite(ip iuk+1 ,p n , 


ite(il> iki k,Pi k , 
a)...) 


ite{i/> iki k,Pi k , 
a)...) 


■k 


ite(V>i fc+1 ,fe+i,Pi fc+1 , 
a)...) 



Table 1: Combining values e and e' of a variable a of a scalar type. 
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Iterating values of array type In Table [2lwe assume that variable a is of an array type. We combine the 
expressions e and e' (computed in Algorithm [7| according to their syntactical structure. We try to match e 
and e' to an expression in the first column and first row respectively. In case either e or e' fails to match any 
of the expressions, then the resulting iterated value is Ax.*. Otherwise we pick an expression from the table 
common to matched column and row. We again use the vector notation. In particular, vector x represent 
formal parameters of a value of a and its dimension therefore matches dimension of the array. The syntax 
structure of the first expressions in the first row identify the case, when a is not written to along I at all. 
The second expression in the first row captures sequence of n writes along backbone path I. The outer-most 
ite expression represents the last write along I, while the most nested one represents the first write. The 
expressions of the first column have very similar meaning as those in the first row. The only difference is, 
that expressions in the first column capture arbitrary iteration of all already processed backbone paths (and 
not only a single the current one). The most complicated expressions in the table lies in the last column. 
The expression in the second row computes iteration of all writes along the path I. The iterated expression 
has similar structure as the one at the first row. The only difference is that expressions di , Si are transformed 
into iterated versions hi,ti. The expression in the last row and column combines iterations of all writes along 
all already iterated path including the current path I. Since it does not mater on the order of writes from 
different backbone paths, we append iterated versions of writes along I as the most nested ite expressions 
in the result, i.e. ite(h m+1 ,t m+1 . . Ate(h m+n ,t m+n , Af.a(x)) . . .). 





Ax-a(x) 


Ax.ite(d!,si, 

ite(d n , s n j 

AX-«(X)) ■ • ■) 






Ax-ite(/ii,ii, 




Ax-a(x) 


ite(/i n , t n , 






Ax-a(x)) • ■ •) 


Ax.ite(ci,ri, 


Ax-ite(ci,ri, 


\x.ite(h 1 ,t 1 , 


rfce(c m , T m , 


ite(c m j v ni 3 




Ax-a(x)) ■ • •) 


Ax-a(x)) • • •) 


Ax-a(x)) • • ■) 



Table 2: Combining values e and e' using ip of a variable a of an array type. 

To fully understand meaning of the table, we also need to discuss structure of expressions Ci,di,hi and 
r,, Sj,tj appearing inside ite expressions in the table. None of these expressions has an implicit occurrence 
of the basic symbol a, a path counter k G K, or parameter r € T. And none of them is equal to *. All 
occurrences of mentioned symbols in the expressions are always stated explicitly in their description. 

We start with the description of expressions c,. Each Ci declaratively identifies all those indices into the 
array, where i-th nested ite expression in e writes during all iteration of all already examined backbone 
paths. The indices can be expressed as follows 

Ci = 3f . x = Ui AO < f < ft A (j) t {U, P) A 7i. 

Vector x serves only as a placeholder, where actual parameters are substituted, when we read from the array. 
Vector Hi may contain parameters from f and identifies possible indices, where the i-th ite expression may 
write its value during all the iteration of already examined backbone paths. Therefore, if for some concrete 
vector / of indices into the array there exists a f such that the formula (x = ^i)[x/^\ 1S true, then we know, 
that / identifies element of the array overwritten by the i-th ite expression during the iteration. But values 
of parameters f must be real - they capture only iterations of examined backbone paths where number of 
their iterations do not exceed values of counters. Therefore, parameters f are restricted from top by a vector 
of expressions Pi. Expressions in pi possibly (and typically) contain some of the counters k%, . . . , K p . Formula 
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(f>i(U,P) checks whether is the value, which was written to the array last at an element identified by \. 
We discuss structure of 4>i(U,P) later. Only note that U is a sequence of all vectors Ui and P is a sequence 
of all vectors pi. Each formula ji uniquely identifies a single path I in B from l s down to location of i-th 
write. We use formula pc(l) to express the condition for I. 

Expressions di have similar structure as expressions Cj. But they capture writes only along current path 
I. They have the following structure. 

I 3t . x — Vi AO < t < qi A <pi(V, Q) A 7^ if Vi contains at least one parameter 
1 X = Vi otherwise 

The first case matches the situation, when path I contains at least one component vertex. Analysing related 
SCCs recursively we receive imported counters. Therefore, value of the array was iterated by just discussed 
procedure and it implies the more complicated structure. The second case identifies common symbolic write 
into the array along Only note that V is a sequence of all vectors Vi and Q is a sequence of all vectors gj. 
We discuss structure of 4>i{V, Q) later. 

Expressions hi express iterated versions of expressions di. Since we iteratively combine backbone paths 
of B into resulting 9 K the structure of e actually represent iterated version of value of the array (but only for 
already examined backbone paths). Therefore, expressions Ci already represent iterated versions of writes 
along examined path. Since we want to extend iterated value of a in 9 K by writes along current / it is obvious 
that structure of expressions hi is the same as for the expressions q: 

hi = 37 . x = Wi AO < f < gi A 4>i(W,G) A ite(i < m, 7*, iP[k/t\), 
Wj = ite(i < m,Ui,Vi- m [K/T\) 

g % = ite(i < m,pi,ite(T{vi- m ) 7^ 0, (gf- m , fc(vi- m ) \ IC{g l - m )) T , (IC{gi- m ) T ))- 

Note that vectors Wi,g~i are defined to choose right expression either from e or e'. Also note, that formula 
7 is extended by formula ite(z < m,^?p,ip), where ip was computed in Algorithm [7] It distinguishes writes 
along the current path I from writes along other already examined backbone paths. We discuss structure of 
(j>i(W, G) bellow. Only note that W is a sequence of all vectors Wi and G is a sequence of all vectors Gi. 

Now we can discuss structure of formulae 4>i(Z,B). The sequence Z = {zi, . . . , z^} contains all those 
indices to the array, where the array is written to during the iteration of all already examined backbone 
paths. Note that each such index is a vector of symbolic expressions of dimension m, if m is a dimension of 
the array a. The sequence B = {b\, . . . , bk} containts vectors restricting values of parameters f appearing 
in related indices in Z. The formula (f>i(Z,B) has the following structure: 

k k 
4>i({zi,...,Z k },{ti,...,b k }) = Vt^,. ..,f' k . (/\ Tj < Tj < bj) -> (f\ CiZj^JWj/Tj]), 

i=i 3=1 



Xj^z if i 7^ j or some r appears in z 
true otherwise 



We see, that the formula is not sentence. It contains free variables - parameters t, € T - which are exposed 
in the formula through vectors Tj. Note that two different fj,f%,j 7^ k may share some parameters. But all 
these free variables (parameters) can be stored in a single vector r, which is exactly the one existentially 
quantified in expressions Ci,di and hi. The formula 4>i(Z,B) states for given parameters f that each write 
to the array in any future iterations of already examined backbone paths will store its value to the different 
element of the array then to the one indexed by z t . In other words, the formula says that a value lastly 
overwritten in the array at index z\ was done by i-th ite expression in iteration identified by parameters r . 

Each Si denotes a symbolic expression written to the array along current backbone path I. We do not 
restrict their syntactic structure in any way. 

Expressions and U have similar structure, since each j-j represent iterated version of an expression 
written to the array, and each ti has the same meaning, but it also includes iterated versions of expressions 
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written along the current backbone path I, i.e. iterated versions of expressions q. We therefore discuss only 
structure of tf. 

U = ite(i < m,ri,6i). 

We see, that for alH < m we have ti = r^. Since expressions r-j already are iterated, we do not need to do 
any action for them. For all the remaining expressions Si_ m (i.e. m + l<i<m + n)we need to compute 
their iterated versions, before we put them into U. We express the iterated version of s,_ TO by the expression 

' (fn + a(z)){wi/x} if s i-m is of a form a(z) + f 

s = ^ ite(hi A (p'^W), p l (s i - m ,v i - m )[K l /T l ]{w i /x},-k) if Sj_ TO is of a form a(zKi + y) + f 

* if Si_ m is any other expression containing a 

,Si- m [Ki/n]{wi/x} otherwise 

The first case identifies a situation, when a single element of the array indexed by Wi is updated several 
times during iteration of B such that the values in the element follow an arithmetic progression. The second 
case identifies a situation, when sequences of elements of the array follow some arithmetic progressions. We 
discuss details of this case bellow. Whenever Sj_ TO contains a, but it is not of the syntactic form of neither 
the first nor the second case, then we return *. The forth case matches any symbolic expressions without a 
inside them. 

In the second case of Si each written element of the array is a part of a linear function. The single 
write can produce several lines, and each written element of the array belongs to exactly one of the lines. 
The iterated version of is thus expression declaratively describing all the lines. But it is not only about 
describing the lines. We must also ensure, that other writes to the array (along any backbone paths) do not 
corrupt them during the iteration. That is the reason for the ite expression for this case. The condition of 
the ite expression checks, whether lines are not corrupted during the iteration of backbone paths. We have 
already discussed structure of hi. Therefore, it remains to describe structure of boolean expression cp'^W). 
Remember, that W is a sequence of all vectors uJj. Structure of <fii(W) is very similar to (j>i(W,G), since 
they have the same purpose - to detect accidental writes to selected array elements. 

k k 

. . . ,z k }) = w u . . . y k . (A o < 3 < tso -+ (A azj^m/^}) 

3 = 1 3 = 1 

Since formula (j>i(W,G) detects accidental writes in future iterations, the formula <fr'i(W) can only check 
overwrites in the previous iterations (see the antecedent of the implication) . Note that there are free variables 
in <t>i(W) (exactly those which are free in <pi(W, G)), which are existentially bind through hi (i.e. scope of 
3t in hi covers also ^(W)). 

Expression pi(si- m ,Vi- m ) identifies the lines. Note that Si_ m is of a form a(zni + y) + f. When 
P = Vi- m [Ki/(Ki + 1)] — Vi- m is a vector identifying differences in indices between subsequent iterations, and 
q = Vi- m — zki — y is a vector identifying differences in indices between 1-value and r-value, then we require 
that Diag(p)<7 > and at least one element of the vector T>i&g(p)q must be strictly greater then 0. If one 
of these requirements is not met, then we evaluate p(si- m ,Vi- m ) to ★. Otherwise we define Pi(si- m ,Vi- m ) 
as an expression 

ite((2iKf + j/i) mod \qi\ = qifl A ■■■ A (z k Ki + y k ) mod \q k \ = q kfi , 
fn + a(zi<?i i0 + 2/i, ■ • ■ , z k q kfi + y k ), 

ite((zi/cj +y\) mod \q±\ = qi >N A ■ ■ ■ A (z k K t +y k ) mod \q k \ = q k , N , 
fn + a(z\q\ tN +y 1 ,..., z k q kjN + y k ), 
*)...), 

where k is dimension of the array, qtj = min{\qi\,j} and N — max{|gi|, . . . , \q k \} — 1. Note that presence of 
* in the expression is only technical - to simplify listing of the formula. The formula can never be evaluated 
to that *. 
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As an example, consider a program expression a[i] < — a[i-5] + 1. Then Pi{a{n + i — 5) + 1, k + i), 
where iterated value of i is k + i, has five composed ite expressions. There are generated five independent 
lines in array a during the iteration. Array elements of these lines are interleaved modulo 5 in a. And 
formula pi captures this property. 



5.1.2 Computing number of iterations of nested Loop 

Here we compute a linear function between a symbol s„, representing the expression K Jt \ + • • • + « 7 ,m T , and 
the path counters k Q) i, . . . , K a ,n- Of course, if there is no such linear function, we fail to infer the function. 
It may also be the case, that there is a linear relationship, but coefficients of the function do not form liner 
functions over input symbols. It that case, we also fails to compute the function. The main idea behind our 
algorithm computing the linear function can be explained as follows. 

We start with precise formulation of a condition identifying, whether value of s 7 is linearly dependent on 
path counters K a .i, • • • , t a ,n or not. Formula ^(j) is weakened looping condition of C. It ensures, that each 
iteration along the loop C gets back to the entry vertex 1st (7), until it is a time to leave it. Leaving the loop 
means to follow some path in C from the loop entry vertex to one of its exit vertices. Formulae in along 
all these paths identify the the leaving condition after that successful iteration in C. So we need all these 
formulae to describe the iterations of C. But these formulae describe the iterations only for single (you can 
imagine the last) iteration of B'. To capture arbitrary (previous) iteration of B' we need to substitute 9 K 
into these formulae. Therefore, the discussed condition identifying iterations of C can be formally expressed 
as 

r(7,C,0)= V/ f A t{l<x)6 

/3e{/3 I tPeB A lst(/3)eexits(C)} \«epref(/3) 

It only remains to state, that whenever we have a proper iteration of C, identified by T(j, "J, 9 K ), then number 
of its iterations s 7 is linearly dependent on path counters n a _n ■ ■ ■ , « a ,n- Let us first discuss a case, when 
there is no occurrence of a basic symbol of an array type in T(7, ^, 6 K ). We describe how to deal with arrays 
at the end of the section. 

Let a be a vector of all basic symbols of scalar types appearing in r(7, &,6 K ). We want to state, that 
for each concrete input (i.e. for each assignment of concrete values to symbols in a), there is a vector p of 
integers and some integer q, such that s 7 = fFti + q, for each possible choice of concrete values for s 7 and 
path counters k appearing in T(7, 9 K ). We can formally write the linear relationship as 

Vo3p, qV«, s 7 ((k > A s, ( > A T( 7 , , d R )) -> s, ( = max{0, fit + g}) . 

Presence of function max in the formula solves cases, when linear relation would imply negative value for s . 
But s 7 is a natural number and negative value for s 7 only implies that C is not iterated at all. Therefore, in 
such situations we provide the alternative choice for s , to be equal to 0. 

In the presented formula the values p, q may vary for each choice of concrete input values a. Although 
an SMT solver may give us an answer that given formula is valid, we can only conclude that there indeed is 
a linear relationship between number of iterations of C and values of the path counters. But we do not know 
the relationship itself. To force an SMT solver to compute the linear relationship for us we do the following. 
We restrict ourselves only to those linear functions, where its coefficients are some fixed linear combinations 
of input values. In other words, we only focus on those relationships, where all variations of p, q for different 
inputs a can be captured by a single (fixed) linear combination of input values a. This restriction allow us 
to move existential quantification to the front of the formula. And we get 

3M, w, Va, k, s 7 ((« > A s 7 > A r( 7 , §, 6 R )) -> s 7 = max{0, (Mk + w) T ( f ) }) , 

where M and w are matrix and vector of unknown integers to be computed by an SMT solver respectively. 
If the formula is satisfiable, then we can get the integers as a part of model of the formula from an SMT 
solver. These integers define the linear combinations of input we wanted. Although a type of M and a 
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dimension of w might be clear from the formula, we rather discuss it. If number of basic symbols in the 
formula (i.e. dimension of a) is m and number of path counters in the formula (i.e. dimension of k) is n, 
type of matrix M is (m + 1) x n, and dimension of w is m + 1. 

The last formula would be the result, if modern SMT solvers had performed well on it. We have 
experimented with powerful SMT solver Z3. But the performance was poor. We found very simple instances 
of the formula, where it took several minutes for the SMT solver to check satisfiability for each of them. 
We discovered, that performance issue lies in nested general quantifiers brought to the formula through 
looping conditions. Fortunately, we do not need to express all iterations of C in each iteration of B. It is 
sufficient for the relationship to ensure, that we stay in C in (s_ 7 — l)-st iteration of C in each iteration of B. 
(Leaving of C in s -th iteration is then ensured by formulae collected from ^> along paths to exit vertices). 
Therefore, if ^(7) is a looping condition of a form Vs (0 < s < s 7 — > ip), then we can replace it by a condition 
< s 7 — 1 — > i/j[s/(s — 1)]. Z3 SMT solver is able to decide satisfiable such updated formulae in tens 
of miliseconds. Which is significant performance improvement. To integrate the modification into our last 
formlula, we formally introduce a formula f2 defined on vertices of B as follows 

!0 < s 7 — 1 — > ip[ s /(§.~f ~ 1)] if 7 is a component vertex, where 
§(7) = Vs (0 < s < s 7 -> V)) 
^(7) Otherwise 

Using n we can finally define a formula 5 7 

S 7 ee 3M,w,Va,K,s 7 ((k>0As 7 > A r( 7 , £2, 6*)) -+ s_ 7 = max{0, [MR + wf (f )}) , 

whose satisfiability we check to compute the relationship. 

The last thing to be discussed an occurrence of arrays in T( , y,^ > 9 K ). Although we try to express s 7 
as a linear function, whose coefficients are some fixed linear combinations of input values of only scalar 
types, presence of array symbols in T(7, "P, 9 K ) may strongly affect existence of such a relationship. We 
must ensure, that a relation exists not only for all possible values to symbols of scalar types, but also for 
all possible contents of arrays. Unfortunately, we cannot quantify a function symbol in first order language. 
Therefore, we solve the problem in two steps. First, we introduce a fresh function symbol p. This function 
accepts as arguments a, i.e. whole input to symbols of scalar types. The function returns for each input a 
a unique integer number. It means that p is injective. Formally speaking we add the following axiom into 
extended theory Tp. 

Vai,a 2 p(ai) = p(a 2 ) ai = a 2 . 

The second step we need to do is to replace each function symbol application A(e\, . . . , e^) occurring in 
T(7, 9 K ) by an application A{e\, . . . , e^, p(a)). It means that basic symbols of array types have changed 
their type such that their dimension have been increased by one. This way we ensure, that for each assignment 
to a we have a fresh contents of all arrays for checking satisfiability of our formula 5 7 . 

We are ready to describe in Algorithm[8]the computation of the expression identifying number of iterations 
of C as a linear function of path counters of B. We assume, that the axiom for function p is automatically 
inserted into extended theory Tp of P. 



Algorithm 8: iterationsOf Component (7, 9 K ) 

1 S" 7 i — 3M,w,Va,K,s 7 ((k> 0As 7 > A Tfr, £2, 0*)) -> s 7 = max{0, (Mk + w) t (f)}) 

2 Extend all function symbols applications in 5 7 by an extra parameter p(a) 

3 if S-y is satisfiable (ask an SMT solver) then 

4 retrieve M, w from a model of 5 7 computed by the SMT solver 

5 return max{0, (Mk + w) t (f)} 

6 return * 



21 



5.2 Computation of looping condition ip K 

Let a be a loop entry vertex of B of a program P. Then we can build a backbone tree B' of an induced 
program of the loop at the loop entry vertex. Let tt[, . . . , Tr' n , where n = 770(a), be all the backbone paths of 



B' . After symbolic execution of B' we receive filled in function and according to Section 5.1 we can then 
also compute an iterated symbolic state Q K of B' . Now we are ready to express a looping condition ip K of the 
loop over- approximating all path conditions representing feasible paths iterating tree B' . The formula ip K is 
defined as follows 

n 

ip H = /\(Vt; (0<Ti< -> 3fi (0 < n < k z A pcB'(ir'i, ^O^Km/ti, . . . , K a , n /r n ])), where 

T~i = {t\ , . . . , Ti_l , Tj +1 , . . . , T n 



L) • • • i 1 n) i 



T 



(^a.l, - ■ ■ , ^a,z— 1, ^a,2+l, ■ • ■ , ^a,n) 

The formula can be explained as follows. Feasible path iterating in the induced program of the loop give us 
concrete values of the paths counters n a ,\ 1 ■ ■ ■ , n a ,n- For each path counter (i.e. all its concrete values) the 
looping condition must ensure, that the backbone path tt^ is executed at least n a> i times. Therefore, for each 
execution number between and /« Q ,i — 1 there must exist actual execution numbers t±, . . . , Tj_i, Tj+i, . . . , t„ 
of remaining backbone paths lying in their limits (i.e. < r,- < n a j) such that execution of 7r^ is possible, 
i.e. path condition of 7r 2 ' is satisfiable. This must be ensured for execution numbers of all backbone paths. 
Note that we substitute 9 K into path condition pce'(7r,-, $'). This is necessary, because values in the path 
condition capture only single execution along the path. Substitution converts those values into functions of 
path counters, so they represent any possible number of iterations of backbone paths. Also note that we do 
not have to ensure in the looping condition that path is also executed at most n a j times. This property 
is handled by backbone paths in B, since they contains paths from loop entry vertices to all possible loop 
exits. Assertions along these paths do the job. 

Lemma 2. Only free variables in tp K are the path counters K a< i, ■ ■ ■ , K a ,n- 

Proof. Obvious. □ 

Lemma 3. Let Vt'(a) be updated to a formula ip K computed as described above. Then for any path condition 
f representing a feasible path iterating B' the sentence ip pc B (a, "J, true) is valid. 

Proof. It directly follows from the construction of pc e (a, 'J, true). □ 



5.3 Discussing Relaxations 

We finish computation of loop over-approximation by a brief discussion of the relaxations we use in the 
computation of (p. There are many loops in real-world programs where interleaving of paths through the 
loops is not important for reasoning about conditions below them. For example many C++ programs 
manipulate sequential containers by calling Standard Template Library functions like copy, find, f ind_if , 
transform, f or_each, count, count_if . Loops in these functions commonly have the property. And it is also 
very common that iterations of loops are controlled by values following monotone progressions. Consider for 
example concept of iterators in C++ Standard Template Library. Branchings below such loops are mostly 
dependent on a final state of these progressions. Therefore, using the relaxations we can compute ip such 
that it is well balanced between complexity and precision. 



6 Soundness and Incompleteness 

In this section we formulate and prove soundness and incompleteness theorems for our algorithm. 

Theorem 1 (Soundness). Let (p be the necessary condition computed by our algorithm for a given target 
program location. If ip is not satisfiable, then the target location is not reachable in that program. 
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Informal proof. We build any looping condition tp K such that it is implied by all path conditions of an 
analysed loop. And each formula pcg^j, collects all the predicated along backbone 7Tj and it also collects 
all looping conditions at loop entries along the path. Therefore, pc^^i,^) must be implied by any path 
condition of any symbolic execution along m. We compute tp as a disjunction of formulae tpi for all backbones. 
Since any program path leading to the target location must follow some backbone (with possible temporary 
escapes into loops along the backbone), its path condition exists (i.e. it is satisfiable formula) only if tp is 
satisfiable. □ 

Theorem 2 (Incompleteness). There is a program and an unreachable target location in it for which the 
formula tp computed by our algorithm is satisfiable. 

Proof. Let us consider the following C code: 

int i = 1; while (i < 3) { if (i == 2) i = 1 ; else i = 2; } 

The loop never terminates. Therefore, a program location below it is not reachable. But tp computed for 
that location is equal to true, since variable i does not follow a monotone progression. □ 



7 Dealing with Quantifiers 

We can ask an SMT solver whether a computed necessary condition tp is satisfiable or not. And if it is, 
we may further ask for some its model. As we will see in Section [8] such queries to a solver should be fast. 
Unfortunately, our experience with solvers shows that presence of quantifiers in tp usually causes performance 
issues. Although SMT technology evolves quickly, we show in this section how to overcome this issue now 
by unfolding universally quantified formulae the looping conditions tp K are made of. 

Universally quantified variables Tj in tp K are always restricted from above by path counters Ki counting 
iterations of backbones 7Tj of analysed loop. Let us choose some upper limits Ki > for the path counters 
Ki. Since each Tj ranges over a finite set of integers {0, . . . , Ki — 1} now, we can unfold each universally 
quantified formula in tp K for each possible value of Tj. Having eliminated the universal quantification, we 
can also eliminate existential quantification of all Ki and all fj by redefining them as uninterpreted integer 
constants. For given upper limits K for the path counters k we denote an unfolded necessary condition tp 
by<p R . 

For any K the formula tp K represents wakened tp. Higher values we choose in K, then we get closer to 
the precision of tp. In practice we must choose moderate values K, since the unfolding process makes tp K 
much longer then tp. 

In some cases an SMT solver is able to quickly decide satisfiability of tp. Therefore, we ask the solver for 
satisfiability of tp in parallel with the unfolding procedure described above. And there is a common timeout 
for both queries. We take the fastest answer. In case both queries exceeds the timeout, the condition tp 
cannot help a tool to cover given target location. 



8 Integration into Tools 

Tools typically explore program paths iteratively. At each iteration there is a set of program locations 
{i>i, . . . ,i>fc}, from which the symbolic execution may continue further. At the beginning the set contains 
only program entry location. In each iteration of the symbolic execution the set is updated such that actions 
of program edges going out from some locations Vi are symbolically executed. Different tools use different 
systematic and heuristic strategies for selecting locations fj to be processed in the current iteration. It is 
also important to note that for each Vi there is available an actual path condition tpi capturing already taken 
symbolic execution from the entry location up to m. 

When a tool detects difficulties in some iteration to cover a particular program location, then using tp it 
can restrict selection from the whole set {v\, . . . , Vk} to only those locations Vi, for which a formula tpi A tp is 
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satisfiable. In other words, if for some Vi the formula <pi A ip is not satisfiable, then we are guaranteed there is 
no real path from Uj to the target location. And therefore, Wj can safely be removed from the consideration. 

Tools like Sage, Pex or Cute combine symbolic execution with concrete one. Let us assume that a 
location m, for which the formula <piA<p is satisfiable, was selected in a current iteration. These tools require 
a concrete input to the program to proceed further from Uj. Such an input can directly be extracted from 
any model of the formula tpi A <p. 

9 Experimental Results 

We implemented the algorithm in an experimental program, which we call Apc. We also prepared a small 
set of benchmark programs mostly taken from other papers. In each benchmark we marked a single location 
as the target one. All the benchmarks have a huge number of paths, so it is difficult to reach the target. We 
run Pex and Apc on the benchmarks and we measured times till the target locations were reached. This 
measurement is obviously unfair from Pex perspective, since its task is to cover an analysed benchmark 
by tests and not to reach a single particular location in it. Therefore, we clarify the right meaning of the 
measurement now. 

Our only goal here is to show, that Pex could benefit from our algorithm. Typical scenario when running 
Pex on a benchmark is that all the code except the target location is covered in few seconds (typically up 
to three). Then Pex keeps searching space of program paths for a longer time without covering the target 
location. This is exactly the situation when our heuristic should be activated. We of course do not know the 
exact moment, when Pex would activate it. Therefore, we can only provide running times of our heuristic 
as it was activated at the beginning of the analysis. 

Before we present the results, we discuss the benchmarks. Benchmark HWM checks whether an input 
string contains four substrings Hello, world, at and Microsoft ! . It does not matter at which position and 
in which order the words occur in the string. The target location can be reached only when all the words 
are presented in the string. This benchmark was introduced in jj. The benchmark consists of four loops in 
a sequence, where each loop searches for a single of the four words mentioned above. Each loop checks for 
an occurrence of a related word at each position in the input string starting from the beginning. Benchmark 
HWM is the most complicated one from our set of benchmarks. We also took its two lightened versions 
presented in [22]: Benchmark HW consists of two loops searching the input string for the first two words 
above. And benchmark Hello searches only for the first one. 

Benchmark MatrlR scans upper triangle of an input matrix. The matrix can be of any rank bigger then 
20 x 20. In each row we count a number of elements inside a fixed range (10, 100). When sum of counts 
from all the rows exceeds a fixed limit 15, then the target location is reached. 

Benchmarks OneLoop and TwoLoops originate from 22J. They are designed such that their target 
locations are not reachable. Both benchmarks contain a loop in which the variable i (initially set to 0) is 
increased by 4 in each iteration. The target location is then guarded by an assertion i==15 in OneLoop 
benchmark and by a loop while (i != j +7) j += 2 in the second one. We note that j is initialized to 
before the loop. 

The last benchmark WinDriver comes from a practice and we took it from [14] . It is a part of a Windows 
driver processing a stream of network packets. It reads an input stream and decomposes it into a two 
dimensional array of packets. A position in the array where the data from the stream are copied into are 
encoded in the input stream itself. We marked the target location as a failure branch of a consistency check 
of the filled in array. It was discussed in the paper Q3] the consistency check can indeed be broken. 

The experimental results are depicted in Table [3] They show running times in seconds of Pex and Apc 
on the benchmarks. We did all the measurements on a single common desktop computer^] The mark T/O 
in Pex column indicates that it failed to reach the target location within an hour. For Apc we provide the 
total running times and also time profiles of different paths of the computation. In sub-column 'Bid tp' there 
are times required to build the necessary condition ip. In sub-column 'Unf/SMT <p K ' there are two times 

1 Intel® Core™ i7 CPU 920 @ 2.67GHz 2.67GHz, 6GB RAM, Windows 7 Professional 64-bit, MS Pex 0.92.50603.1, MS 
Moles 1.0.0.0, MS Visual Studio 2008, MS .NET Framework v3.5 SP1, MS Z3 SMT solver v3.2, and boost vl.42.0. 
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Pex 


Apc 


Benchmark 


lotai 




Rlrl in 


TTnf/^MT in K 


GMT m 


Hello 


5.257 


0.181 


0.021 


0.290 / S 0.060 


S 0.160 


HW 


25.05 


0.941 


0.073 


0.698 / S 0.170 


S 13.84 


HWM 


T/O 


4.660 


1.715 


2.135 / S 0.810 


X M/O 


MatrlR 


95.00 


0.035 


0.015 


0.491 / S 70.80 


S 0.020 


WinD river 


28.39 


0.627 


0.178 


0.369 / S 0.080 


X 4.860 


OneLoop 


134.0 


0.003 


0.001 


0.001 / U 0.001 


U 0.010 


TwoLoops 


64.00 


0.003 


0.002 


0.004 / U 0.010 


U 0.001 



Table 3: Running times of Pex and Apc on benchmarks. 



for each benchmark. The first number identifies a time spent by unfolding the formula (p into tp . We use a 
fixed number 25 for all the counters and benchmarks. The second number represent a time spent by Z3 SMT 
solver [31] to decide satisfiability of the unfolded formula (p K . Characters in front of these times identify 
results of the queries: S for satisfiable, U for unsatisfiable and X for unknown. And the last sub-column 
'SMT <£>' contains running times of Z3 SMT solver directly on formulae tp. The mark M/O means that Z3 
went out of memory. As we explained in Section [7] the construction and satisfiability checking of ip K runs in 
parallel with satisfiability checking of tp. Therefore, we take the minimum of the times to compute the total 
runing time of Apc. 

10 Related Work 

Early work on symbolic execution [20] HH] showed its effectiveness in test generation. King further 
showed that symbolic execution can bring more automation into Floyd's inductive proving method |20l 18]. 
Nevertheless, loops as the source of the path explosion problem were not in the center of interest. 

More recent approaches dealt mostly with limitations of SMT solvers and the environment problem by 
combining the symbolic execution with the concrete one [HI H HH [HI H31 HOI EE H31 H3J- Although 
practical usability of the symbolic execution improved, these approaches still suffer from the path explosion 
problem. An interesting idea is to combine the symbolic execution with a complementary technique |161 
ITS] [2] EU [17] . Complementary techniques typically perform differently on different parts of the analysed 
program. Therefore, an information exchange between the techniques leads to a mutual improvement of their 
performance. There are also techniques based on saving of already observed program behaviour and early 
terminating those executions, whose further progress will not explore a new one [U [B]. Compositional 
approaches are typically based on computation of function summaries [T]. A function summary often 
consists of pre and post condition. Preconditions identify paths through the function and postconditions 
capture effects of the function along those paths. Reusing these summaries at call sites typically leads to an 
interesting performance improvement. In addition the summaries may insert additional symbolic values into 
the path condition which causes another improvement. And there are also techniques partitioning program 
paths into separate classes according to similarities in program states [23] [55J . Values of output variables 
of a program or function are typically considered as a partitioning criteria. A search strategy Fitnex [2U] 
implemented in Pex [28] uses state-dependent fitness values computed through a fitness function to guide 
a path exploration. The function measures how close an already discovered feasible path is to a particular 
target location (to be covered by a test). The fitness function computes the fitness value for each occurrence 
of a predicate related to a chosen program branching along the path. The minimum value is the resulting 
one. There are also orthogonal approaches dealing with the path explosion problem by introducing some 
assumptions about program input. There are, for example, specialized techniques for programs manipulating 
strings [3] [30] , and techniques reducing input space by a given grammar [10l [26] . 

Although the techniques above showed performance improvements when dealing with the path explosion 
problem, they do not focus directly on loops. The LESE [55] approach introduces symbolic variables for 
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the number of times each loop was executed and links these with features of a known input grammar 
such as variable-length or repeating fields. This allows the symbolic constraints to cover a class of paths 
that includes different number of loop iterations, expressing loop-dependent program values in terms of 
the input. A technique presented in [15] analyses loops on-the-fly, i.e. during simultaneous concrete and 
symbolic execution of a program for a concrete input. The loop analysis infers inductive variables. A 
variable is inductive if it is modified by a constant value in each loop iteration. These variables are used 
to build loop summaries expressed in a form of pre a post conditions. The summaries are derived from 
the partial loop invariants synthesized dynamically using pattern matching rules on the loop guards and 
induction variables. In our previous work |22| we introduced an algorithm sharing the same goal as one 
presented here. Nevertheless, in |22j we transform an analysed program into chains and we do the remaining 
analysis there. For each chain with sub-chains we build a constraint system serving as an oracle for steering 
the symbolic execution in the path space towards the target location. 

11 Conclusion 

We presented algorithm computing for a given target program location the necessary condition (p representing 
an over-approximated set of real program paths leading to the target. We proposed the use of (p in tests 
generation tools based on symbolic execution. Having ip such a tool can cover the target location faster by 
exploring only program paths in the over-approximated set. We also showed that (p can be used in the tools 
very easily and naturally. And we finally showed by the experimental results that Pex could benefit from 
our algorithm. 
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A Examples 

A.l Iterating a variable A of an array type 

Example 1. We want to compute an iterated value of an array variable A for the following loop 

for (int i = 0; i < n; ++i) 
A[i] = i; 

After symbolic execution of loop's body variable A has a value X\ ■ ite(x = A{x))- And suppose that 
variable i was already iterated, i.e. 6 K (i) = n + i. Then expressions e and e' from Algorithm [7] are assigned 
as follows 

e = Ax • A(x) 

e' = Xx ■ ite(x = K + i, K + i, A(x))- 

Since there is only single backbone path I in the backbone tree of the program, pc(l) = true. According to 
Table [2] we receive the following values: 

wi = {n + i) [k/t] = t + i 
9i = K 

C({T + i},lA) = x¥=T + i 
</>i({r + 1},{k}) = Vt'(t <t'<k4x^t' + i) 
71 = true 
tp = true 

h 1 = 3T.x = T + iA0<T<nA Vt'(t < t' < k ^ x ^ T> + i) 
h = (k + i)[k/t]Wx} = {T + i){{T + i)/x} = X 

Therefore the resulting iterated value for array A is 

Ax ■ ite(3r . x = T + i A < r < n A Vt'(t < r' < n ->• x ^ t' + i), x, A(x)) 

Note that in special cases like this one, we can simply detect, that r + i is a monotone function. Since there 
are no other writes to the array, the condition cf>i is redundant and the expression can be simplified into 

Ax • ite(3r . x = t + i AO < t < k, X , A(x)) 

Example 2. Let us consider the following C++ program: 

for (int i = 1; i < n; ++i) { 
A[i-1] = i; 
A[i] = i; 

} 

After symbolic execution of loop's body variable A has a value Ax ■ ite(x = i, i, ite(x = i — 1, i, A(x)))- And 
suppose that variable i was already iterated, i.e. # K (i) = n + i. Then expressions e and e! from Algorithm [7] 
are assigned as follows 

e = A X ■ A( X ) 

e' = Xx ■ ite(x = k + i, k + i, ite(x = K + i — l,n + i, A(x)))- 
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Since there is only single backbone path I in the backbone tree of the program, pc(l) = true. According to 
Table [2] we receive the following values: 

Wi = (k + i)[K/r] = t + i 

w 2 = (k + i — 1)[k/t] = t + i — 1 

91 = K 

92 = K 

(({T + i},l,l) = X ^T + i 
C({T + i-l},l,2) = X^T + i-l 

(f>i({T + i,T + i- 1},{k,k}) = Vt'(t < t' < k ^ {x ^ t' + i_A x ^ t' + i - I)) 
C({r + i },2,l) = x^r + i 
C({r + i -l},2,2) = x^r + i-l 
Mi T + h r + i - 1}, {k, k}) = Vr'(r <t'<k^(x^t' + iAx^t' + i-1)) 

7x = true 
ip = true 

h 1 = 3T.x = T + iA0<T<nA 

Vt'(t <t'<k^(x^t' + iAx^t' + !-1)) 

h 2 = 3T.x = T + i- lAQ<T<KA 

Vt'(t <t'<k^(x^t' + iAx^t' + !-1)) 

h = {K + i)[n/T]{ Wl /x} = (t + i){(r + = X 

i 2 = (k + i)[n/T}{w 2 /x} = (t + i){(r + i - l)/x} = X + 1 



Therefore the resulting iterated value for array A is 

Ax • ite(3r . x = T + iA0< r < kAVt'(t < r' < K-i (x^r' + ! Ax^t'+i-1)),x, 

ite(3r . x = r + ?- 1 AO < t < kA Vt'(t < t < n -)• (x ^ r' + i A x 7^ r' + i - 1)), x + 1, 

4(x))) 

The condition of the outer ite expression can be satisfied only for r = k — 1. On the other hand, the 
condition of the nested ite expression is satisfiable for all values of r. Also note that r + i — 1 is a monotone 
function. Therefore, we can simplify the iterated value into 

Ax • ite(x = K - 1+ i, X, ite(3r . x = r + i-l AO <r < k,x+1, 4(x))) 

Example 3. Let us consider the following C++ program: 

for (int i = 0; i < n; ++i) 

if (i 7, 2 == 0) // i.e. is 'i' even? 

A[i] = 2*i + 1; 
else 

A[i] = 5; 

There are two backbone paths along the loop. The first one ii goes through positive branch and the second 
path l 2 through the negative one. After symbolic execution of the loop's body variable A has the following 
values: 

9(Zi)(A)=Ax.ite(x = i,2t+l,A(x)) 
9(Z 2 )(A)=Ax.ite(x = i,5,A(x)) 
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And suppose that variable i was already iterated, i.e. 9 K (±) — n\ + k 2 + i. The path counters K\ and k 2 
are newly introduced path counters for backbone paths l\ and l 2 respectively. The loop at Algorithm [7] is 
executed twice. First for backbone path l± and then for l 2 . At the first execution, for the backbone path l±, 
the expressions e and e' are as follows 

e ee Ax • A(x) 

e' = Xx ■ ite(x = «i + n 2 +i, 2(/«i + k 2 + i) + l,A(x)), 

For the backbone path l\ we have pc(h)9 K = (ki + k 2 + 1) mod 2 = 0. According to Table [2] we receive the 
following values: 

wi = («i + /t 2 + i) [«/f] = n + r 2 + i 
Pi = («i, k 2 ) t 
C({n + r 2 + i}, 1, l) ee x 7^ n + t 2 + i 

0i({n+r 2 + i},{(S)})=v(^) ((£)<(;})<(£)-► X?^+7£+j) 
7x = irite 

V>[k/t] ee (pc(Zi)0*)[k/t] ee (n + r 2 + z) mod 2 = 

h x = 3{^) . x = T 1+ r 2+1 A0< (£) < (£) A <M{ti + t 2 +|},{(£)})) 
f x ee (2(ki + k 2 + i) + l)[K/f]{(n + r 2 + |)/x} = 2 X + 1 



Therefore new value of e is 

Ax • ite(3(^) . x = r 1 +r 2+1 A0< (£) < (£)A 



v(:;)((^)<(;i)<(-)^x^r( + ^ + ,)A 

(ti + r 2 + i) mod 2 = 0, 
2x + l,4(x)) 

Since n + r 2 + i is a monotone function, the condition 4>i is redundant and the expression e can be simplified 
into 

Ax-ite(3(£) .X = n+T2+|A0<(?J)<(^)A(ri+T 2 +i) mod 2 = 0, 2 X + 1, A{ X )) 

At the second execution of the loop at Algorithm [7j for the backbone path Z 2) the expressions e and e' are 
as follows 

£ eeAx • ite(3(£) . x = n+r 2 +iA0< (^J) < (^) A(n+r 2 +i) mod 2 = 0, 2 X + 1, A(x)) 
e' ee Ax • ite(x = Ki + « 2 + i, 5, A(x)). 

For the backbone path l 2 we have pc(l 2 )9 K = (ki + n 2 + i) mod 2^0. According to Table [2] we receive the 
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following values: 



Wl — Tl + t 2 + i 

W 2 = («1 + K 2 + i) [k/ f] = Tl + T 2 + i 

gi = {ki,k 2 ) T 

92 = {ki,k 2 ) t 
C({n + T 2 + i}, 1, 1) = X ^ n + t 2 + i 
C({n + t 2 + i}, 1, 2) = x n + r 2 + i 
(pi({wi, w 2 }, {51,52}) = <f>i({wi}, {gi}) (since W!=w 2 and gi = g 2 ) 

-v(^)((-x(;i)<(-)^ x ^T{ + ^+i) 

C({n + T 2 + i}, 2, 1) = X ^ Tl + T 2 + 1 
C({T1 + T 2 + i}, 2, 2) = X ^ Tl + T 2 + 1 

(j> 2 ({w- l ,w 2 },{g 1 ,g2}) = <f>2({wi},{gi}) (since w 1 =w 2 and g x = g 2 ) 

-v(;i)((-x(;i)<(-)^x^T{+^+i) 

71 = (ri + T2 + z) mod 2 = 
^[k/t] = {pc{h)e R )[K/f] = (n + t 2 + i) mod 2^0 
/ii=3(£) . x = Tl+T2 + iA0<(£)<(£)A 

0i({tOi,iW2},{5i,ff2}) A (n +t 2 +i) mod 2 = 
/i 2 = 3(£) . x = ri+T2 + iA0< { T T \) <(S)A 

2 ({wi,w 2 },{gi,g 2 }) A (n +t 2 +i) mod 2^0 

tl =2 X +1 

t 2 = 5[K/f]{(T 1 +r 2 +i)/x} = 5 



Therefore the resulting iterated value for array A is 

A X • ite(3(£) . x-Ti+r 2+i A0< (£) < (£)A 

V(:!)((^)< (:!) <(-)^x^t{+t 2 + z)a 
(Ti+r 2 +i) mod2 = 0,2x + l, 
ite(3(£) . X -Ti+T 2+i A0< (£) < (£)A 

v(;i)((^)< <ca)->x^ 1+ / 2 +i)A 

(Ti+r 2 +i) mod2^0,5,A(x)) 

We can see, that conditions 4>i and <p 2 are redundant in the expression (since w\ — w 2 and g\ 
Therefore we can simplify the resulting value into 

A X .ite(3(£) • X = n + r 2 + i A < (£ ) < ) A (n + r 2 + i) mod 2 = 0,2* + 1, 

ite(3(£) .X = Ti+r 2 +iA0<(^)<(^)A(r 1 +r 2 +i) mod 2 ^ 0, 5, A(x)) 

Example 4. Let us consider the following C++ program: 

for (int i = 0; i < m; ++i) { 
id = B[i*(n+1)+1] ; 
for (int j =0; j < n; 
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A[id][j] = B[id*(n+l)+j+2] ; 

} 

The program consists of two nested loops. We first express an iterated value of 2D array A after the inner 
loop. Then we apply the same procedure to express A after the outer one. 

After symbolic execution of the inner loop's body variable A has a value X\ ■ ite(x = (id, j) , B(id(n + 
1) +j + 2), A(x))- And suppose that variable j was already iterated, i.e. 6 K (j) = Kx +j, where K\ is a path 
counter introduced for the only backbone path of inner loop. Then expressions e and e' from Algorithm [7] 
are assigned as follows 

e = Ax • A(x) 

e' = Ax • ite(x = (id, k x + f) T , B(id(n + 1) + m + j + 2),A( X ))- 

Note that x = (xi? X2)- Since there is only single backbone path / in the inner loop, pc(l) = true. According 
to Table [2] we receive the following values: 

wi ee (id,Ki + j) T [«i/ri] = (id,n + j) T 

.91 = Kl 

Mi(idi)hM) = Vt i (n < t[ < K X + (n+j)) 
7i ee true 
tjj ee true 

hx = 3n . X = (rf+i) A0 < n < ki A Vt{ (tx < t[ < Kx -)• X ^ (r^+j)) 
tx = B(id(n + l) + nx +i + 2)[ Kl /T 1 ]{wx/x} = 

B(id(n + 1) + tx + i + 2){(id, tx + jj/x} = 

B( X i(n+l) + X2 + 2) 

Therefore the resulting iterated value for array A from the inner loop is 

Ax ■ ite(3ri . x = (id, n + f) T a o < n < ki a Wt{ (n < t[ < kx -> x ¥= (id, n + i) T ), 

B(xi(n + l)+X2 + 2),A(x)) 

Since both functions id and t\ + j are monotone and there is not other write to A, we can simplify the 
expression into 

Ax • itepn . x= (*d,Ti + f) T AO <tx < Kx,B(xx(n + 1) + X2 + 2), A( X )) 

We may proceed to the outer loop. There we first eliminate imported path counter k\ such that we 
substitute all its occurrences by an expression max{0,n}. We discuss the elimination of imported path 
counters in Section |5.1| and computation of an expression to be substituted in Section |5.1.2| We also 
describe the computation of the expression max{0,n} in Example [5] Also note that values of variables j and 
id are set to and B_(i(n + 1) + 1) respectively, before entering the inner loop. Therefore, after symbolic 
execution of the outer loop's body the variable A has a value 

Ax • ite(3ri . x = (B(i(n + 1) + 1), n) T A < n < max{0, n}, B( X x(n + 1) + X2 + 2),A( X )). 

And suppose that variable i was already iterated, i.e. 9 K (±) = k + i, where k is a path counter introduced 
for the only backbone path of the outer loop. Then expressions e and e' from Algorithm [7] are assigned as 
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follows 



e = Ax . A(x) 

e' = Ax • ite(3n . X = {B((k + i)(n + 1) + l),ri) T A < n < max{0,n}, 
S(xi(« + l) + X2 + 2),A(x)). 

Since there is only a single backbone path Z in the inner loop, pc(l) = true. According to Table [2] we receive 
the following values: 

w x = (B(( K + €)(n + l) + l),Ti) T [K/T} = (B((T + i)(n + l) + l), n ) T 
gi = (max{0, n}, k) t 
C({wi}, 1, 1) = x + (B((r + i)(n + 1) + 1), n) T 

= v(;i) ((?)< < (— M -+xV 

71 = true 
ip = true 

hx = 3(^) . X = ( £«r+iXn+i)+i) ) A < ( 7 ) < ( max l ^ ) A MWih {&}) 
h = B(Xi(n + l) + X2 + 2)[k/t]{w 1 /x} = B( X i(n + 1) + X 2 + 2) 
Therefore the resulting iterated value for array A is 

Ax • ite(3 (?) . x = ( B((r+i)(ii+i)+i) ) A < ( ? ) < ( ^{o,™} ) Aj 

V(^) ((?) < (J) < ( max I ^>) ^xV ^B((r'+i)fe+l)+l)^ 

B(Xi(« + l)+X2 + 2),A(x)) 

A. 2 Building formula 5 7 and using SMT solver on it 

Example 5. Let us consider the following C++ program 

for (int i = 0; i < m; ++i) 
for (int j =0; j < n; ++j) 
A[i] [j] = 0; 

There is only single backbone path (the body of the outer loop) in B of the program. Let V\, . . . ,v$ be all 
its vertices. Then v\ is l s , v 7 is l tl «4 is the only component vertex of B, and lst(ti 5 ) is the only exit vertex 
of SCC C V4 . Then after a symbolic execution of B the map has the following content: 



• ^(vi) — true 

• ^(^2) — i < m 



• ^(^3) = true 

• ^(vi) = Vt„ 4 (0 < t V4 < k V4 -> r„ 4 < n) 

• *("5) = Kvi > Tk 

• ^(vq) — true 

• ^(vj) = true 

After elimination of imported counter k.„ 4 we receive (only changes are shown) 

• *(v 4 ) = Vs (0 < s < s ->• s < n) 
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• *(v 5 ) =s Vi >n 

The symbolic state 6(tv) stores the following values of variables i and j 

• e(v 7 )(i) = i + i 

• («7)(j) = K V4 

After elimination of imported counter k V4 we receive (only changes are shown) 

. e(v 7 )(j) = s V4 

Let us now suppose that 9 K stores the following value of i 

• e R {i) = n + i 

Note that n is a fresh path counter introduced for our backbone path. We are ready to build formula S V4 . 
We start with T{vi,Vl,e R ): 

r> 4 , n, e R ) = (0 < s„ 4 - 1 s V4 - 1< n) A s V4 > n 

Note that f2(i>4) is evaluated according to the first case, since W4 is a component vertex. But £l(vc,) is 
evaluated according to the first case. Also note that substitution of 9 K into formulae did not incorporate 
introduced counter k into the resulting formula. Therefore we do not need to introduce matrix M. Since 
r(t>4, £1, 8 K ) contains only single basic symbol n, thus a — (n) T ~ n. And vector w — (wi, W2) T 1 because we 
have the only basic symbol in the formula. The formula S V4 looks as follows 

S Vi = 3wi,w 2 Vn,s Vi ((s V4 > Ar(i/ 4 ,fl,f )) ->• s Vi = max{0, W\n + w 2 }) 

And when wc substitute formula T into S V4 we obtain 

S V4 = 3wi , w 2 Vn, s Vi ((s Vi > A (0 < s Vi — 1 — > s Vi — 1 < n) A s Vi > n) — > s Vi = max{0, Win + w 2 }) 

Then we ask an SMT solver, whether the formula is satisfiablc or not. And if so we further ask for a model 
to get values of integers w\ and w 2 - We see, that formula is satisfiablc and w\ = 1 and w 2 = 0. Therefore 
we return a symbolic expression: 

max{0, n} 

Example 6. Let us consider the following C++ program 

for (int i = 0; i < m; ++i) 
for (int j = i; j < n; 
A[i] [j] = 0; 



There is only single backbone path (the body of the outer loop) in B of the program. Let Vi , . . . , v 6 be all 



its vertices, 
of SCC C V4 . 


Then wi is l s , v-j is / t , V4 is the only component vertex of B, and lst(w 5 ) is the only exit vertex 
Then after a symbolic execution of B the map * has the following content: 




= true 


• *(u 2 ) 


= i< m 


• *(U3) 


= true 




= Vt„ 4 (0 < t V4 < K Vi ->• t V4 + i < n) 


• *(«b) 


— k V4 + i> n 


• *K) 


= true 
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• ^(tv) = true 

After elimination of imported counter k V4 we receive (only changes are shown) 

• *(u 4 ) = Vs (0 < s < s V4 -> s + i < n) 

• *(«5) = s„ 4 +i > « 

The symbolic state 6(1)7) stores the following values of variables i and j 

• @{v 7 ){i) = i + l 

• ©K)(j) = K«4 +1 

After elimination of imported counter k V4 we receive (only changes are shown) 

• eto)(j) = a„ 4 +i 

Let us now suppose that re stores the following value of i and j 

• 6> 5 (i) = K + i 

• ^(j) = «„ 4 +K + i 

Note that n is a fresh path counter introduced for our backbone path. We are ready to build formula S V4 . 
We start with T(v 4 , fi, 9 R ): 

T(v 4 , 0, 6> s ) = (0 < s V4 - 1 -)• s„ 4 - 1 + k + i < n) A s V4 + n + i > n 

Note that Q,(va) is evaluated according to the first case, since v\ is a component vertex. But 0(^5) is evaluated 
according to the first case. Also note that substitution of 6 K into formulae incorporated the introduced 
counter k into the resulting formula. Therefore we have k — (k) t — n and matrix M = (mi, rri2, ffl3) T is of 
type (2 + 1) x 1, since there are two basic symbols n,i and just one counter k in the formula V. Further we 
have a = (n,i) T , and vector w — (wi,W2 7 W3) T . The formula S V4 looks as follows 

S V4 = 3M,w,Va,K,s V4 ((k >0As„ 4 > A T(v 4 , Q, 6 R )) s V4 = max{0, (Mk + w) T (f )}) 

And when we substitute formula T into S Vi we obtain 

S V4 = 3M, w, Va, k, s Vi ((k > A s„ 4 > A (0 < s V4 — 1 — > s V4 — 1 + k + i < n) A s V4 + k + i > n) 

^s Vi =max{0,(M K + «;) T (f)}) 

Then we ask Z3 SMT solver, whether the formula is satisfiable or not. And if so we further ask for a model 
to get values of integers m, and Wj. We see, that formula is satisfiable and mi = m 2 = 0,m 3 = —1, and 
w\ = l,u>2 = — 1,11)3 = 0. Therefore we return a symbolic expression: 

max{0, — k + n — i} 

A.3 Hello 

char H[6] = "Hello"; 
int h = 0; 

for (int i = 0; A[i] != 0; ++i) { 
int j = i, k = 0; 

while (H[k] != && A[j] != kk A[j] == H[k]) { 
++k; 

} 

if (H[k] ==0) { h = 1; break; } 
if (A[j] == 0) break; 

} 

if (h == 1) 

assert (false) ; 



35 




Figure 1: Program Hello 
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Figure 2: Induced program of loops in Hello (a) Outer loop (b) Inner loop 
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Inner loop After symbolic execution of backbone tree of the induced program we receive the following 

properties. Note that resulting backbone tree has only single backbone path ghijkG. 

Function 

^(g) = true 

*(gh) = H(k) o 

9{ght) = A{[) 
*{ghij) = A{j) = H{k) 
^(ghijk) = true 
^(ghijkG) = true 



Function 0: 

Q{ghijkG)(j)=j + l 
Q(ghijkG)(k) = k + l 
Q(ghijkG)(k) = X X . A( X ) 
Q(ghijkG)(R) = X X • H(X) 



Note that * = >3> and 6 = 9. Therefore, a symbolic state 9 K is: 

0*(j) = «i+J 

6» K (k) = Kl + k 

0*(A) = X X . A(x) 
0*(H) = A X • H(x) 

And a looping condition cp K is: 

ip R = Vn [0 < ti < ki -> (n + fc) ^ A A(n + j) ^ A A(n + j) = H(n + k))] 

Outer loop After symbolic execution of backbone tree of the induced program we receive the following 
properties. Note that resulting backbone tree has only single backbone path defyghilmnD, since backbone 
paths going through program edges (g, I) or (h, I) are infeasible. 
Function ^: 



*(<*) 


= true 




= A(i) * o 


*(de/) 


= £rwe 


*(de/i/) 


= true 


^(defyg) 


= Vn [0 < n < ki -> Gff(Ti) f OA A(ri + i) ^ a A(n + i) 


^(defygh) 


= ff(Kl)^0 


^(defyghi) 


= A(ki +i) 7^ 


^(defyghil) 


= ^(Ki + i) t^h( Ki ) 


ty(defyghilm) 


= H( Kl )^0 


^(defyghilmn) 


= A(n 1 +i) ^ 


<J (defyghilmnD) 


= true 
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Function 6: 



Q(defyghilmnD)(i) = i + 1 
Q(defyghilmnD)(j) = K\+i 
Q(defyghilmnD)(k) — k x 
Q(defyghilmnD)(k) = \ X ■ A(x) 
e(defyghilmnD)(R) = \ X . H(x) 

Function 

= true 
*(de) = A(i) ^ 
^>{def) = true 
iff (defy) — true 

^(defyg) = Vs [0 < s < s x -> Off^) ^ A ^Ui + i) ^ A A(sj + i) = ^(sj)] 
*(defygh) = ^ 

^(defyghi) = A(S! + i) ^ 

*(de/tf</W0=4Ui+i)? 4 S(»i) 
^(defyghilm) = ^0 

^/(defyghilmn) = + i) 7^ 

^/(defyghilmnD) — true 



Function 9: 

Q(defyghilmnD)(i) = i + 1 
<d(defyghilmnD)(j) = s x + i 
Q{defyghilmnD)(k) = 
0(defyghilmnD)(A) = X X ■ A( X ) 
Q(defyghtlmnD)(n) = X X . H( X ) 
Q(defyghilmnD)(s 1 ) = s a 

Symbolic state 9 K after iteration of regular variables: 

9«(i) = K 2 +l 

**(j)=* 
6>*(k) = * 

0*(A) = A X . A{x) 

0*(H) - A X • H(X), 

where K2 is a fresh counter introduce that single backbone path. 
Formula T(defyg,n,9 K 7 x) looks as follows: 

T(defyg, 0, B , x) =(0 < s x - 1 -> (^(sj - l,x)^0A 

- l + K 2 +i,i)= #Ui - 1, x))) A 

7^ A 

A(s x tK 2 + i,i)/0A 
A(s x + k 2 + i, x) ^ H_{s_i , x) 
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Note that we used 'J, 0, and 6' K [s 1 — > s x ] to compose T(defyg 7 V,,9 K 7 x). Then formula Si is: 
Si = 3 ( , f ll) V f I V k 2 , 2l (0 < k 2 A < 5l A Y(defyg, Q,6%,x))^ 



' mi \ / wi \ \ T 



rn.4 / \ tU4 



Since Si is not satisfiable we have ^ = *. Therefore a fix-point 9 K is 

0*(l) = K2 + i 

0*(j)=* 

6> K (k) = * 

0* (A) = A X . A( X ) 

r (h) = a x . six) 

0*(si)=* 

And looping condition <^ K is: 

p K = Vr 2 [0 < t 2 < k 2 -> ( 
4(^2 +i)^0A 
3«i(0 < Ki A 

(Vri[0 < n < ki -> (H(ti) ^ A A(n + r 2 + i) ^ A A(n + r 2 + i) = H(n))}) A 
H{k{) ^ A 

+ T 2+i) 7^0 A 
A(k 1 +T 2 +1) = H( Kl )))\ 
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Whole program After symbolic execution of a backbone tree of the program we receive following function 



*(a) 


= trite 


*(ar) 


= trite 


*(ars) 


= trite 


''S(arst) 


= trite 


#(...ti) 


= trite 


*(...«) 


= trite 


*(...&) 


= trite 


*(...c) 


= trite 


tf(...ar) 


= trite 


*(...d) 


-Vr 2 [0 



2 < K 2 -> ( 

4(r 2 ) ^ A 
3«i(0 < «i A 

(Vn[0 < n < ki -> (ff'(n) ^ A A( n + r 2 ) ^ A A(n + r 2 ) = tf'(n))]) A 

H'(/si) + A 

+ t 2 ) ^ A 

A(/C! +75) =£'(«!)))] 

*(...e) = A( K2 )^0 
. ./) = trite 
. . y) = true 

*(. . . .9) = Vr 3 [0 < r 3 < k 3 -> (^'(r 3 ) ^ A A(t 3 + k 2 ) ^ A A(r 3 + « 2 ) = ff'fo))] 

*(.../) =^'(«: 3 ) = 

*{...o)=H!{k 3 )=0 

$(. . . p) = trite 

\I>(. . . q) = trite 



Note that a backbone tree of the program has only a single backbone path arstuvbcxdefyglopq after its 
symbolic execution, since each backbone path going through some of program edges (d,p), (g,h), (l,m) is 
not feasible. Because of space limitations we have abbreviated vertices of the backbone tree. From the 
same reasons we have introduced new function symbol H : int — > int representing content of array -ff 
and it is defined as follows Vt H'(t) = ite(r = 0,72,ite(r = 1, 101,ite(r = 2, 108,ite(r = 3, 108, ite(r = 
4, lll,ite(r = 5, 0, H(t))))))). Note that at vertex . . .g we have recycled the result of the analysis of the 
inner loop. We have introduced a fresh path counter k 3 . 



41 



Finally the abstraction (p looks as follows 



(p =3k 2 (0 < k 2 A 

Vt 2 [0 < r 2 < k 2 -t ( 
A(t 2 ) ± A 
3«i(0 < Ki A 

(Vn[0 < n < ki -> Cff'(n) ^ A A(n + r 2 ) ^ A A(n + r 2 ) = #'(ti))]) A 

ff'(/si) ^ A 

+ 72 ) 7^ A 

A(ki + tvj) = A 
A(k 2 ) ^ A 
3/t 3 (0 < k 3 A 

Vr 3 [0 < r 3 < k 3 -> (iT'fo) ^ A A(t 3 + « 2 ) ^ A A(r 3 + « 2 ) = F'(r 3 ))] A 

i/'M = o)) 

We ask an SMT solver to get model. A model of the formula define symbolic input for array A to contain a 
string "Hello", which would navigate symbolic execution directly to the target location. 



A.4 MatrlR 

It took lmin 36s for Pex to reach the target location in the following program: 
int w = ; 

for (int i = 0; i < m; ++i) { 
int k = 0; 

for (int j = i; j < n; ++j) 

if (A[i][j] > 10 && A[i][j] < 100) 
++k; 
if (k > 15) { 

w = 1; 

break; 

} 

} 

if (m > 15 && n > 20 && w == 1) 
assert (false) ; 



Inner loop There are three backbone paths ghijkG, ghikG, and ghkG in a backbone tree of the inner 
loop. 
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Figure 3: Program MatrlR 
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Function ^ looks as follows 





= true 




= 3 < n 


v(ghi) 


= A{h3_) > 10 


*(ghij) 


= A{i, j) < 100 


^(ghijk) 


= true 


^{ghijkG) 


= true 


^!(ghik) 


= A{h3_) >= 100 


^(ghikG) 


= true 


y(ghk) 


= A(i,j) <= 10 


^(ghkG) 


= true 



Function is 

Q(ghijkG)(±) = i Q(ghikG)(i) = i Q(ghkG)(i) = i 

e(ghijkG)(j)=l+l &(ghikG)(j) = i+ 1 G{ghkG)(j) = j + 1 

Q(ghijkG)(k) = k + l Q{ghikG)(k) = k Q(ghkG)(k) = k 

Q(ghijkG)(n) = n Q(ghikG){n) = n Q(ghkG)(n) = n 

e(ghtjkG)(A) = Ax • A(X) &{ghikG){k) = Ax . A(x) e(ghkG)(k) = Ax . A(x) 

Since = ^ and = 9, we receive the following iterated symbolic state 6 K 

- i 

= «1,1 +Kl,2 + Kl,3+i 
0*(k) = Ki,i +k 

<9 K (n) = n 

6 R (k) = \x . A(x) 

Note that we introduced path counters Ki,i, «i,2> «i,3 for the backbone paths ghijkG,ghikG,ghkG respec- 
tively. And looping condition (p K looks as follows 

^ = (Vr 1 , 1 0<r 1 , 1 <« ll i-^3(^) ( ° ) < ( ^ ) < ( ) A 

n,i + n,2 + 7-1,3 + j<« a 

n,i + n,2 + ti,3 + j) > 10 a 

7-1,1 + Tl,2 + Tl,3 + j) < 100) A 

(Vr 1 , 2 0<r 1 , 2 <« li2 -^3(^) (g) <(££)< (E£ ) A 

71,1 + Ti,2 + 7-1,3 + 1 < R A 

4(i> t m + T i,2 + 7-1,3 + j) > 100) A 

(Vri, 3 o<Ti, 3 <«i,3^3(^) (8)<(^)<(S:0 A 

7i,i + 7-1,2 + 7-1,3 + j < n A 

AH, 7-1,1 +Tl,2+Tl,3+i) < 10) 
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Outer loop There is a single backbone paths defxglmo in a backbone tree of the inner 
Function '3/ looks as follows 



Function 9 is 



= true 
*(de) = i<m 
■fy(def) = true 
\I>(. . . x) = true 

= (Vt m < n,i < Ki,i — > 

3(^;0 (8) < (^:a) < («;:») a 

n,i + n,2 + n,3 + i < n a 

n,i + ri )2 + t 1>3 + i) > 10 A 
n,i + t-1,2 + 7-1,3 + i) < 100) a 

(Vri, 2 < ri, 2 < «i,2 -> 

3(^) (8)<(^)<("i:i) A 

7-1,1 + 7-1,2 + 7-1,3 + i < IL A 

A(i, n,i + n, 2 + ri, 3 + i) > 100) A 
(Vri, 3 < ri,3 < Ki, 3 -> 

3(-;0 (°)<(^)<(S:0 A 

7"i,i + 7-1,2 + 7-1,3 + i < n A 

n,l + Tl,2 + 71,3 + 1) < 10) 

. .1) = «i,i + k 1j2 + Ki, 3 + i > n 
. .m) = < 15 
. . 6) = true 



0(defxglmo)(±) = i + 1 
@{defxglmo)(j) = k m + k 1)2 + Ki i3 
9 (defxglmo) (k) = ki,i 
<d(defxglmo)(m) = to 
Q(defxglmo)(n.) = n 
&{defxglmo){k) = Af . A(f) 



Function * looks as follows 



*(d) 


= true 


*(de) 


= i< m 


*(de/) 


— true 


*(...a;) 


= true 


%--9) 


= Vs (0 < s < Sj — > s + i < n) 


§(.../) 


= s 1 + i> n 


. .to) 


= * < 15 


*(...o) 


= ir«e 
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Function 9 is 



0(defxglmo)(i) 


= t + l 


@(defxglmo)(j) 


= «i +i 


0(defxglmo)(k) 


= * 


Q(defxglmo)(m) 


= m 


Q(defxglmo)(ii) 


= n 


&(defxglmo)(k) 


= Ax . A(x) 



After iteration of regular program variables we receive the following 8 K 



0*(1) 


= k 2 + i 


o*(i) 


= * 


6> K (k) 


= * 


0>) 


= m 


0*(n) 


= n 


0*(A) 


= • A(x) 



Function T(defxg,H,,9 K (A.),x) looks as follows 

T(defxg, 0, # K (A),x) =(0 <s 1 -l->s 1 -l + K2+i<n) A Si+ K 2+i>Zi 
And therefore the formula Si is 

/ mi \ / wi \ / rt \ 

Si =3 I «2 J , lw2 J V(-) ,/c 2 ,Si («2 > A s x > A (0 < s x - 1 -> s x - 1 + k 2 +i < n) A 

Sj + k 2 +i > n) -> Si = max jo, ((§) «2 + )) (l 

After we get a model from an SMT solver we build the following solution for s x 

s t = max{0, n — k 2 — i} 

Therefore fix-point 9 K is 

0*(i) = K2+* 

K (j) = max{0, n — k 2 — i} + i 
6*(k) = * 
0*(m) = m 
6>*(n) = n 
0*(A) = Af • A(x) 

Looping condition <p K of the outer loop looks as follows 

ip R = Vt 2 < r 2 < k 2 -> («2 + i < m A 3 ( «|]2 ^ 8 ) < ( «|]2 ) A 
^(defxg) A ki.i + «i. 2 + Ki ; 3 + 1 > n A «i.i < 15) 
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Whole program The backbone tree of the program has only single backbone path abcdefxglnopqr after 
its symbolic execution, since a backbone path going through program edge (d, o) is infeasible. Therefore 
function ^ looks as follows: 

"J/ (a) = true 
ty(ab) = true 
^(abc) — true 

. . d) = Vr 2 < t 2 < n 2 -> (r 2 < m A 



Kl,2 



(Vr M < r M < h. 3 ( ^ ) ( o ) < ( ^ ) < ( ) A 

n,i + 7-i,2 + n,3 + t 2 < n A 

4(7"2, ri,! + 7i,2 + Ti, 3 + r 2 ) > 10 A 
^(T2,ri,i+ri,2+Ti,3+T 2 ) < 100) A 

(Vri, 2 0<ri, 2 <«i, 2 ^3(^) (g)<(^)<(S;0 A 

7-1,1 + 7-1,2 + 7-1,3 + T 2 < n A 

A(T2, 71,1 +71,2 +71,3 +7- 2 ) > 100) A 

(vti, 3 o<ti,3<«i,3^3(^) (g) a 

t-1,1 + 7-1,2 + 7"i,3 + t 2 < n A 
4(t 2 ,ti,i + n, 2 + ri, 3 + 72) < 10) A 

«1,1 + «1,2 + «1,3 + T 2 > n A Ki,i < 15) 



*(■■ 


..e) = 


k 2 < m 


*(■■ 


•/) = 


true 


*(■■ 


.0!) = 


true 


*(■■ 


■9) = 


(Vr 3 ,i 



- 3 ,i< K3 ,i^3(^) (8)<(^)<(S:n a 

7-3,1 + 73,2 + t 3 , 3 + n 2 < n A 

d4(«2, 7-3,1 + r 3i 2 + r 3 , 3 + k 2 ) > 10 A 

A(K2,T 3A + T 3j2 + T 3 , 3 + K 2 ) < 100) A 

(Vr3, 2 0<r 3 ,2<«3,2-^3(^) (°)<(^)<(^) A 

7-3,1 + 7"3,2 + 73,3 + K 2 < « A 
A(K2,T 3 ,1 + T 3 ,2 + T 3 , 3 + K 2 ) > 100) A 

(Vr 3 , 3 < r 3 , 3 < K3 , 3 3 ( ^ ) ( ° ) < ( ^ ) < ( ^ ) A 

7-3,1 + 73,2 + 7-3,3 + k 2 < n A 

A{l*2, 7-3,1 + 73 i2 + 73,3 + «2) < 10) 

. . Z) = K 3 ,l + K 3 , 2 + K 3 , 3 + K 2 > n 

. .n) = k 3 ,i > 15 
<]>(. . . 0) = true 
*(■ . -p) = m > 15 
=n > 20 
^(. . . r) = true 
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And we receive the following abstraction <p 

ft = 3k 2 (0 < k 2 A 

Vr 2 < t 2 < k 2 -> ( 
r 2 < m A 

(Vr M 0<n,i< 3(£;*) (g) (£s) A 

n,i + n,2 + n,3 + r 2 < n a 
4(t"2, n,i + n,2 + 7-i,3 + t- 2 ) > 10 A 

4(T 2 ,Ti ; i + 71,2 + 7-1,3 + 7 2 ) < 100) A 

(Vr ll2 < r li2 < « li2 3 ( ^ ) ( g ) < ( ^ ) < ( ) A 

7-1,1 + 7"1,2 + 7-1,3 + r 2 < n A 

4(t 2 ,ti ; i + n )2 + ri, 3 + r 2 ) > 100) A 
(Vr 1 , 3 0<r 1 ,3<«i,3-^3(^) (g) < ) < A 

n.l + n.2 + 7-1,3 + T 2 < n A 

4(7-2, T M + Tl,2 + 7l j3 + 7 2 ) < 10) A 

«1,1 + K l,2 + «1,3 + r 2 > n A 

ki,i < 15) A 

k 2 < m A 

-(S) ((g) ^(S) A 

(Vr3,iO<r 3 ,i<K 3 ,i^3(^) (8)< (^:») < («5:s) a 

7-3,1 + 7-3,2 + 7-3,3 + k 2 <n A 

A{K2, T3,l + T 3 , 2 + T 3 , 3 + K 2 ) > 10 A 

A{K2, T3,i + t 3 , 2 + r 3 , 3 + k 2 ) < 100) A 

(VT3,2 0<r 3 ,2<K3,2^3(^) (g)< (^) < ( ^ ) A 

7-3,1 + 7-3,2 + 7-3,3 + k 2 < n A 

A{K2, T 3 ,l + T 3 , 2 + T 3 , 3 + K 2 ) > 100) A 

(vr 3 , 3 < r 3 ,3 < «3, 3 -+ 3 ( r T i : i ) ( § ) < ( %i ) < ( z:i ) A 

7-3,1 + 7-3,2 + 7-3,3 + k 2 < n A 

A{k 2 , 73,1 + T 3 , 2 + T 3 , 3 + K 2 ) < 10) A 
«3,1 + «3,2 + «3,3 + K2 > n A 

k 3 ,i > 15 A 
m > 15 A 
n > 20)) 

From a model returned from an SMT solver we can see, that we have input to the program which reaches 
the target location. Note that although Z3 SMT solver correctly computed content of the array A (i.e. there 
are numbers 11 everywhere), the size n of the array A is unnecessarily large 1257. 
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